s 


/ 


/ /> 


efts 

3/-Z £> 57 


First Annual (Second Semiannual) Technical Report 


To 


Aircraft Guidance and Controls Branch/489 
Guidance and Control Division 
National Aeronautics and Space Administration 
Langley Research Center 
Hampton, VA 23665-5225 


For the Grant 

PARAMETER IDENTIFICATION FOR NONLINEAR AERODYNAMIC SYSTEMS 

Grant No. NAG- 1-1065 


Period Covered: October 23, 1989 to October 22, 1990 


From 


Allan E. Pearson 
Division of Engineering 
Brown University 
Providence, Rhode Island 02912 


Report Prepared By: 


Principal Investigator 
Allan E. Pearson 


Division of Engineering 
Carl Cometta 



Professor of Engineering 
Tele: 401-863-2602 
Date: November 19, 1990 



Executive Officer 
Tele: 401-863-2319 
Date: November 19, 1990 


A-CR-1 ol HlQ) 
FOR NONLINEAR AF 
Report, Oct, 

yrt i v. ) P 


PARAMETER I oW I FI CATION 
YNAHIC SYSTEMS Annuli 
- 22 Oct. 1990 thrown 
19o? * CSCL 01 A 


G 3 /u? 


N 91 - 15 I 26 


Unci as 

on?? 1 '! 


1 


TABLE OF CONTENTS 


Page 

1 . INTRODUCTION 1 

Z LIST OF SCIENTIFIC COLLABORATORS 7 

3. COMPLETED AND CONTINUING RESEARCH 7 

3.1 Modeling Considerations 7 

3.2 Structure Determination 3 

3.3 Data Collinearity 5 

3.4 Frequency Analysis 7 

APPENDIX A: Least Squares Parameter Identification of Nonlinear Differential 

I/O Models 10 

APPENDIX B: Identfiability and Well-Posedness in Nonlinear System I/O Modeling 15 

APPENDIX C: Parameter Identification of Linear Differential Systems Via Fourier 

Based Modulating Functions 17 


APPENDIX D: 


Frequency Analysis Via the Method of Moment Functionals 


45 



- 1 - 


1. Introduction 

This annual technical report covers the first year period of NASA grant NAG 1-1065 
commencing October 23, 1989 and ending October 22, 1990. The research results were par- 
tially described in the semiannual report dated April 26, 1990 and also in the renewal propo- 
sal dated July 20, 1990. However, this report is inclusive of the results obtained during the 
one year period and, in addition, has appended four papers giving details of the algorithms 
and developments forming the background for the current work. 

2. List of Scientific Collaborators 

A. V. Fullerton* Graduate Student Research Assistant 

J. Q. Pan Graduate Student Research Assistant 

A. E. Pearson* Professor and Principal Investigator 

* Received partial support under NAG- 1-1065 

3. Completed and Continuing Research 
3.1. Modeling Considerations 

Our proposed research into the parameter identification for nonlinear aerodynamic sys- 
tems presumes that the underlying model can be arranged into an input/output (I/O) 

differential operator equation of the generic form 1 

2> l (©)F 0 ( M (r),y(O)/ > y(p)^(M(O0'(O)=O a) 

where [u (f)j(O] denote input/output variables assumed to be available as measured data on 
some time interval, 0 denotes a vector of parameters whose value is to be estimated based on 
the given data, Py ip) is a polynomial in the differential operator p =d !dt for each index pair 
(ij), i=l, • • n x , j=l, • • n 2 , and the [E i j(u,y)J 7 i j(u,y),g i (Q)] are given (sufficiently smooth) 
functions of their arguments that depend on the specified model. Additional data-related 
smoothness conditions pertain to the functions gij(t)=F i j(u(t),y(t )) for inexact models in 
which the Fn's are not all constants. 2 * In the case of exact differential operator models, the 
Fij's are all constants, i.e., (1) reduces to the simpler form 

JJi (0)/^ ip )Ej (w it ),y it )H>. ( 2 ) 

ij 

and the algorithm for parameter estimation is especially efficient for this case since the equa- 
tion error can be integrated exactly given any I/O pair to obtain an algebraic function of the 
parameters. (As detailed in Section 3 of Appendix A.) 


1 Although scalar valued, vector versions of this equation may be developed to accommodate mul- 
tivariable system models. Also, extensions to models which arc nonseparable in some of the parame- 
ters is potentially possible, e.g., differential delay equation models with unknown time delays. 

2 Definitions and illustrations of the terms exact, inexact, separable, etc., are given in the attached 

Appendix A paper together with an algorithm for the model (1). 
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The augmented linearized equations for aircraft discussed in Section 4 of Klein 3 (aug- 
mented by aerodynamic derivative coefficients modeled as parametrized functions of aero- 
dynamic inputs and responses) appear to be representable by the model (2), albeit with a large 
number of parameters. Part of our future work will be directed towards these models. 
Although our simulation experience with nonlinear models is not extensive, we can assert 
with some degree of confidence that the algorithm has good noise rejection properties in the 
case of linear system models with additive noise on the data. As detailed in the attached 
Appendix C paper, the noise rejection properties can be explained via the frequency domain 
interpretation of the Fourier based modulating functions. Therein also can be found an exten- 
sion of the algorithm, via the maximum likelihood technique, to stochastic linear models with 
additive white gaussian noise. These results for linear systems show that the bias incurred by 
deterministic least squares can be effectively removed in a high noise- to- signal situation. 
However, with the exception of linear systems, i.e., systems describable by a linear 
differential operator equation like 

2>»-iP ‘y )= X b n-\-iP * “ ( 0 . a 0 =l (3) 

1=0 1=0 

it is recognized that the I/O models (1) or (2) may appear vague and somewhat formidable 
because the more familiar state vector equations like 

(4) 

y(t)=h(x(t),u(t),Q) 

are almost always the starting point into methods for the parameter identification of deter- 
ministic systems modeled by nonlinear ordinary differential equations. This is understandable 
and, therefore, part of our effort has gone into the relationship between the models (1) and (4) 
insofar as parameter identification is concerned. Although this effort is ongoing, one such 
relationship we have investigated is that of the “identifiability” property of these models. 
This notion has been the subject of a number of papers relative to the state equation model 
(4), and we have shown in the attached Appendix B paper that single-valuedness of the g (0) 
function appearing in (1) is a necessary property else the ensuing parameter estimation prob- 
lem will be ill-posed. Thus, defining the vector function g (0)=(g i(0),g 2( 0 )- ' ' >£n,( e )) rela- 
tive to the model (1), the following “injective” property: 

g (0)=g (0* ) if and only if 0=0* (5) 

is a necessary condition else nonuniqueness will plague the parameter estimation problem. 
This conclusion is based on an examination of the metric properties of the I/O model (1) 
when viewed as a function on the parameter space for 0. As pointed out in Appendix B, this 
conclusion is consistent with the results of other investigators when applied to specific state 
equation models that possess an equivalent I/O representation, but the latter model makes the 
determination of this property more transparent than the test for nonidentifiability of state 
equation models. 


3 Klein, V., “Estimation of Aircraft Aerodynamic Parameters from Flight Data,” Prog. Aerospace 
Sci., Vol. 26, pp. 1-77, 1989. 
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3.2. Structure Determination 

We have extended the algorithm for parameter identification described in the Appendix C 
paper to the order determination problem for linear differential systems, i.e., the model (3) 
where the order n is unknown in addition to the parameters (nj • ■ a n ,b q • ■ b n _{), by making 
use of an important property of modulating functions, namely: any modulating function of 
order N is also a modulating function of order n for any n<N. Thus, by using a set of 
Fourier based modulating functions (as defined by Eq. (4) in Appendix C) of preselected 
order iV^maxn, the algebraic equation counterpart to Eq. (10) of Appendix C is the equation: 4 

t“n-l CD ,+ ‘ "£ V w CD M U, Oo =l ( 6 ) 

1=0 i=0 


*=0,1 ■ • N-n 

Defining the vectors W (m ) and V (m ) by 

W(rn)=CD m Y, V (m )-CD m U 


Eq. (6) can be rewritten as 

= {'Zb i q- i )V{m- 1), m=/i,n+ 1 • • N (7) 

i=0 1=0 

where q~ x is the unit delay operator, i.e., q~ l W (m)=W(m-l). Although arriving at (7) is 
merely a redefinition of previously defined quantities, i.e., calculating the sequence pair 
(V(m),W(m)) is easy once the finite set of Fourier series coefficients of the input/output data 
has been calculated, Equation (7) is now in a form that can utilize well known discrete system 
algorithms if desired. However, here we employ this equivalent form because it facilitates 
iterating on the order n . 

Without going into further algebraic details (full coverage of which will be put in a 
forthcoming paper), we use the parsimony principle in finding the simplest model, i.e., the 
lowest order, that adequately fits the data. The algorithm minimizes the least squares cri- 
terion: 

E n = X* '<**«(*) (8) 

k=n 


over the 2 n parameters (a x • • a n ,b 0 ■ ■ b n _ 1} iteratively for each n starting from n= 1, where 
e n is defined in terms of the equation error for (7). The decision rule for stopping the itera- 
tion is: 


n = mm« 


n | D„<8, l^nZN 


► 


(9) 


where 


4 The precomputable matrix pair (C£>) and the vector pair (U,Y) of finite Fourier series 
coefficients of the I/O data are defined in Section 2 of Appendix C. 




D 


n 


En 


E/i+l 


Ei 


Thus, the iterations are stopped when the rate of improvement in the least squares criterion 
falls below a user chosen threshold 5. 


The above described algorithm has been applied to several examples, including the fol- 
lowing fourth order Chebyshev filter system: 


G(5) = 


0.0438 

s 4 +0.6 1 92s 3 +0.6 1 4s 2 +0.2038 j +0.0492 


( 10 ) 


With the threshold choice 5=0.1 in (9), the result of one simulation for the system (10) is 
summarized by the graphs in Fig. 1. Shown here is the input/output data on a 40 sec time 
interval, a plot of D n verses n , and frequency response plots comparing the magnitudes of the 
transfer functions for the original and estimated system. The T =40sec time interval is approx- 
imately double the settling time for the system (10); this results in a resolution frequency of 
CD 0 =2jt IT =0.157 rad/sec which is adequate to resolve the modes in the frequency response for 
(10) as seen by the magnitude plot in Fig. 1. The output data included about 10% RMS addi- 
tive white noise which accounts for the deviations in the frequency response plots. These and 
additional examples confirm that the algorithm has the potential to correctly determine the 
system order under moderate noise-to- signal ratios. Further research is planned to extend the 
above structure determination algorithm to polynomial I/O differential operator models. 


o(t) 



A. 



y(t) CThcW*x /o% A^ck<.) 
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3.3. Data Collinearity 

Another problem we have addressed during the past year is the degeneracy in a least 

squares estimate caused by feedback. The problem for aircraft has been described in Klein 5 
and is most obvious in die case of linear system identification since linear feedback will 
directly cause collinearity between the input/output regressors in the absence of external inputs 
during the observation interval. Using the algorithm of the Appendix C paper and the setup 
of Fig. 2, we have been studying the tradeoff between estimation accuracy in the parameters 
verses the degree of collinearity between the I/O regressor vectors and the RMS noise level of 
the contaminating noises (v,w). The degree of collinearity is controlled by inserting the 
external signal d(t ), i.e., perfect collinearity results (hence complete degeneracy) under the 
condition: d(t)= 0. 


V(t)-4(Z)- 

IDENTIFIER 

«- 

+ ‘ 

* 



u(t) 


OPEN LOOP 



SYSTEM 






d(t) -►(£) <- 

FEEDBACK GAIN 



Fig. 2 : Closed Loop System Identification 


An example simulation result is shown in Fig. 3 for the system with the forward (open) 
loop transfer function: 


G(s)= 


20 

s 2 +5s-5 


( 11 ) 


The top of Fig. 3 shows the input/output data for one particular external signal which resulted 
in the normalized correlation coefficient Nyu=— .9963, where Nyu is defined by: 

Y'U 

Nyu -mr\ 

and (U,Y) are the vectors of finite Fourier series coefficients of the I/O data defined in 


5 Klein, V., “On Parameter Estimation of Highly Augmented Aircraft,” AIAA Paper No. 89- 
3356. Presented at the 1989 Atmospheric Flight Mechanics Conf., Boston, MA, August 1989. 
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Parameter Identification Under Near 
Perfect Data Collinearity Conditions 

Fig- 3 
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Section 2 of the Appendix C paper. The lower portion of Fig. 3 is the result of numerous 
simulations under different d(t) signals to achieve various values of Nyu while increasing the 
levels of the contaminating noises until a threshold 5% accuracy level is attained in the 
parameter estimates using the least squares algorithm of Appendix C. This graph shows that 
starting from complete degeneracy, i.e., Nyu =—1.0, zero noise level can be tolerated, but mod- 
est nonzero noise levels can be present in increasing amounts as collinearity collapses while 
still maintaining the accuracy level in the estimated parameters of approximately 5%. 

The above results give some idea of the tradeoff between degree of collinearity and noise 
levels in the data for the least squares algorithm of the Appendix C paper, but the question 
arises as to how this compares with other approaches. This is difficult to answer, especially 
since there are not many algorithms available that are designed exclusively for differential 
systems. However, one easy comparison we have made is to apply the ARX parameter esti- 
mation algorithm of the System Identification Toolbox (by L. Ljung) in MATLAB. 6 We have 
been using MATLAB to carry out all simulations anyway, hence the ease in comparison. 
ARX identifies a model in discrete time which we then convert to a differential equation 
model using the MATLAB algorithm CONTIN designed for this purpose. One result of this 
comparison is shown in Fig. 4 which depicts the output error between the actual output and 
the predicted output (using the parameter values estimated by the two algorithms) versus 
increasing levels of RMS noises in the data. These simulations were earned out under non- 
collinearity conditions d(t )* 0 for the system (11) and show the superiority of the algorithm of 
Appendix C in relation to the ARX/CONTIN algorithm for this example. It must be admitted, 
however, that ARX is designed for discrete-time models and, therefore, such a comparison is 
somewhat biased. 

The above study follows an earlier investigation into two approaches to alleviating the 
degeneracies of collinearity; one approach utilizes projection operators designed to zero out 
the collinear vectors known to cause the degeneracies thereby obtaining a least squares regres- 
sion equation with fewer parameters and a better conditioned estimation problem. A second 
approach involves subdividing the total observation time interval into subintervals during 
which linear feedback is in effect, then redefining the modulating functions relative to one or 
more of these intervals in order to formulate the least squares estimation problem specific to 
the intervals causing the degeneracies. Neither of these approaches has shown any advantage 
to the straightforward application of the Appendix C algorithm, an example of which has been 
briefly discussed above for the system (11). An Sc.M. thesis is under preparation by A. Full- 
erton detailing these investigations. 

3.4. Frequency Analysis 

A method of frequency analysis for determining the transfer function GO'©) from tran- 
sient I/O data has been formulated in the Appendix D paper using complex valued Fourier 
based modulating functions in contrast with the trigonometric modulating functions used in 
the Appendix C paper for the parameter estimation problem. We started this investigation 
with the expectation that the more explicit representation of the complex form (compare Eq. 


6 PC-MATLAB by The MathWorks, Inc. 
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(A.14) in Appendix C with Eq. (3) in Appendix D) might be valuable in answering questions 
about optimal inputs for parameter identification, optimal modulating functions for noise 
suppression, and the like. Although this work is ongoing, it became clear that the complex 
form is ideally suited to the frequency analysis problem since it facilitates a linear-in- 
parameters least squares formulation for frequency related parameters. Several variants of the 
formulation are possible, but all utilize a finite set of the Fourier series coefficients calculated 
from the I/O data over time intervals / =1,2, • ■ , each of duration T q=2k/co 0 , where 

©q is a user selected “resolving” frequency. 

A simulation result of applying the algorithm detailed in Section 2.2 of the Appendix D 
paper is given in Fig. 5 under noise-free conditions for the system with the low pass transfer 
function: 


G(s)=- 

s 


1.75 2 +1736.8 

3 +19. Is 2 +257 ,48s +1736.8 ' 


( 12 ) 


The nine seconds of I/O data shown was subdivided into nine [0,T 0 ] intervals, i.e., T 0 = 1, and 
the algorithm produced essentially perfect estimation of the magnitude/phase plots for G (j co) 
at the frequencies k2n, it =1,2 • • 6, as shown by the rectangles. Also shown (by the x marks) 
are the magnitude/phase values that resulted from a direct computation of the ratio of the 
Fourier-type integrals: 

9 

\y(t)e~ jk2m dt 

| , *=1,2 • • 6 (13) 

I’m it )e~i klm dt 


The reason for the errors in the above “direct ratio” is due to the finite time data, i.e., the 
finite limits of integration in (13). 

A similar comparison is shown in Fig. 6 relative to the high pass transfer function: 

s 3 +22.02s 


G(s)=- 


s 3 +22.24s 2 +247.44 s +1943.23 


(14) 


This comparison shows that the error in the direct ratio will generally be more pronounced 
under nonzero initial conditions. 

As the formulations in the Appendix D paper undergo further investigations, one 
modification that has been made is the removal of the presumed knowledge of the DC value 
G(0). These and additional simulations under noise corrupted data conditions will be the 
focus of some our future work in this area. 
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Magnitude Verses Frequency (rad/sec) Phase Verses Frequency (rad/sec) 


Fig. 5: Frequency Analysis for the Low Pass System of Eq. (12) 


Magnitude Verses Frequency for the 
High Pass System of Eq. (14) 

Fig. 6 
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LEAST SQUARES PARAMETER IDENTIFICATION 
OF NONLINEAR DIFFERENTIAL I/O MODELS 1 

A. E. Pearson 

Division of Engineering 
Brown University 
Providence, RI 02912 


ABSTRACT 

A least squares parameter identification technique is formulated 
for deterministic systems modeled by a class of input-output non- 
linear differential operator equations. Based on the notion of 
exactness in the calculus, a distinction is made on the basis of 
whether or not the equation error representation is an exact 
differential expression. It is shown how equation error models 
which are exact can be integrated for any given input-output data 
pair to yield an explicit function of the parameters that can be 
used for standard least squares estimation techniques. The formu- 
lation is then extended to apply to a class of inexact equation error 
system models. Also discussed is the notion of ‘identifiability* as 
it relates to the class of systems under consideration. 

1. Introduction 

Consider a class of deterministic systems whose input-output 
(I/O) relation can be described implicitly by a differential operator 
equation of the form: 

E(y(0^y(0,** p n y(t)*u(t)*pu(t)r p m u(t)fi) = 0 (l) 

where p denotes the differential operator d/dt so that p 2 =d 2 ldt 2 , 
etc. It is assumed that E is a specified scalar valued nonlinear 
function of the output y(0 and its first n time derivatives, the 
input u(x) and its first m time derivatives, m<n f and a parameter 
vector 0; the latter is to be estimated given the input-output data 
[u (x),y (f )]* but not derivatives of the data, on some time interval. 
In the case of linear systems, i.e., E(v. 0 linear in each of ns 
arguments, equation error models have long been used in various 
parameter identification techniques (see for example Mendel [1]). 
A property often taken for granted with linear models is that they 
can be integrated, given continuous-time input-output data, or 
summed given sampled data for discrete-time system models, in 
order to obtain an explicit function of the coefficient parameters. 

Such models seem not to have been explored very much for non- 
linear systems, the emphasis instead being on techniques that deal 

with the normal form state vector equations: 2 

i(O=/(*(O,w(f),0) 

y(O=c(x(O,u(r),0). ( 2 ) 

where if ,c) are given functions, generally nonlinear in x and u, 
parametrized by 0. Although these models apply to a broad class 
of physical systems, it seems difficult to ease the computational 
burden in parameter identification by exploiting any special struc- 
ture like linearity in certain state variables or parameters. In addi- 
tion, unknown initial conditions have to be appended to the 
parameter vector for time limited data unless the data is collected 
under special conditions. 


* Research supported in part by the National Science Foundation 
under Cram ECS S7 13771. 


The purpose of this paper is to show how special structure 
can play a role in easing the parameter identification problem for a 
class of nonlinear systems subsumed by (1). 3 The main assump- 
tions are: (i) that the data is free of measurement noise, and (ii) 
for an arbitrary input-output data pair [u(f),y(f)], observed over a 
time interval 0<r£7\ there exists a parameter vector 0* such that 
the model (1) is satisfied on [0,7*] with 0=0* . Thus, modeling 
errors are presumed small enough that deterministic least squares 
will be meaningful. In addition, standard causality and continuity 
conditions are tacitly assumed so that a unique bounded solution 
y(r) exists satisfying (1) on any finite time interval [0,7] given 
the system parameter vector 0 and a bounded input u (f ) over the 
interval [0,7], together with the appropriate initial conditions. 
However, being able to solve uniquely for y(x) given (0,u(r)]» 
0<r<7, and the initial conditions docs not necessarily imply being 
able to integrate the parametrized equation error (1) relative to a 
data pair [u(t\y(t)) on [0,7] with the aim of obtaining a function 
of 0 useful for parameter identification purposes. 

As examples to illustrate the affect of structure, consider first 
the forced Van der Pol equation: 

y(r)-0i [l-y 2 (f)]> ; (f)+ 02 y (' )*«(')• < 3 > 

Noting that py 3 =3y 2 py, (3) is equivalent to the differential 


operator equation: 

(p 2 -e ,p +0 2 X> - (» >+-j 0 ip>' )-u (f )=o < 4 ) 

and therefore it is the second total differential of a function z (r ,0) 
defined implicitly by the solution to the following equation: 

p 2 z (t ,Q)=(p J -e iP+Qj)y(t >t~j 0iP.y 3 (O-u(r). (5) 


With due consideration paid to unknown initial conditions, the 
solution z(f,0) provides the basis for an explicit equation error 
function of 0 given that [u (/),y (r ),y 3 (0] are regarded as forcing 
functions on some time interval [0,7]. It is noted that (4) can also 
be expressed in the following vector-matrix form: 


O.Mz) 


0 

1 


0 

p 

0 


1 

0 

0 


y(0 

|y 3 (0->(0 

-u (O 


= 0 . 


( 6 ) 


Contrasting with the structure of the preceding example, the 
following differential operator equation does not share a similar 
property: 

p 2 y(r)+0j \pyit) j 2 -e 2 w (/>=0. (7) 


2 For example, the method of quasilincarization (Bellman and Kala- 
ba [2]). 

3 The first four sections of this paper follow closely those of an ear- 
lier paper presented at the 1988 C1SS meeting at Pnnceton University 
(31. 


88CH253 1-2/68/0000 - 1831 SI .00 c 1988 IEEE 


1831 
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This equation, which models the position y(r) of a particle subject 
to force m (r ) and drag proportional to the square of the velocity, 
cannot be integrated to yield an explicit function of 0 when given 
the data [u(r),y (01 over [ 0 ,T] because ( 7 ) is not the total second 
differential of any recognizable function z (/ ,0) as in the case of 
the model ( 4 ). What is needed is some kind of '‘integrating fac- 
tor” to handle the nonlinear drag term since velocity is not a 
measured signal. Utilizing the differential identity. 
p*(y*)= 2 y (p 2 y)+2(py ) 2 , (7) can be equivalently represented by 


2 -0 2 m ( f )-e 1>- (r ) [p v (r ) )=0 

* < 


( 8 ) 


which is in a form suitable for later reference. 

The inherent difference in structure displayed by the above 
examples motivates the equation error identification techniques of 
this paper. The desired properties of any such technique can be 
described abstractly in terms of an operator B whose domain coin- 
cides with the range space of £ and whose image element as a 
function of 0 is expressed as: 

c(0)=2? (£(y py * 'P n y ,upucp m u ,0)) for a given data pair [u,yj. 
This integral-type operator should possess the following 

properties: (i) e(0) can be explicitly computed as a function of 0 
given the input-output data on a time interval [ 0 , 7 * ] without the 
need to estimate unknown initial conditions, i.e., time derivatives 
of the data arc not available, and (ii) e (0) provides a measure of 
error in the parameters such that 1 1 e (0) 1 1 =0 correctly reflects the 
true value of 0 for a suitable norm | | * | | under ideal noise-free 
conditions. Hence, the minimization problem: min 1 1 e ( 0 ) 1 1 facili- 
tates the potential for obtaining a unique least squares estimate of 
the parameters under appropriate nondegeneracy conditions on the 
data. The ease with which such an operator B can be devised 
possessing these properties depends strongly on the nature of the 
model. 


2* Structural Considerations 

Motivated by the notion of an "exact differential" in the cal- 
culus, a system model of the input-output type (1) will be called 
exact if it admits to the representation: 

Eiypyr p n y*upur p m u, 0 )=^/ > l (p)£, tv»«i 0 ) ( 9 ) 

i»! 

where the £,*<?). «’=l,2,* /i |, are polynomials of degree <n in the 
differential operator p and the E t are nonunique but sufficiently 
smooth nonlinear functions of the triple (y^, 0 ). If such a 
representation does not exist, then the model (1) is said to be ex- 
act. The significance of the model being exact is that it is the 
total differential of order n with respect to time t of some func- 
tion r(f,0), i.e., there exists a sufficiently smooth function z(f,0) 
such that for each fixed value of 0: 

p*z(t ,0)= £ P> ip )Ei Cv( it ). 0 ). ( 1 0) 

/= l 

Another basic model property important for system 
identification is that of separability with respect to the parameters. 
Thus, the model ( 1 ) is said to be separable with respect to the 
parameters if there exist scalar- valued functions /t ( (0) and 
Ei(y j>y sp n y m p m u )* suc ^ ^ai 

Eiypyr p n y^pit ,*• p m ufi) 

^hiWEiiyjyr P*yMj>ur P m u), (11) 

i= 1 

In this case it is assumed that the vector function h( 9 ) defined by 
h (0)=col(h i(0),- /i fll (0)) satisfies the single-valued property: 


h (0)=/* (0* ) if and only if 0=0*. 

All linear system models are clearly exact and separable with 
respect to the parameters. If a nonlinear system model of the 
form (1) is both exact and separable with respect to the parame- 
ters, then consistent with the above definitions it will admit to the 
representation: 

Eiypyr p n y ,upu*' p m u,0)=£M0)P,,(p)£ ; (y,a) (13) 

‘•7 

for some polynomials P {J ip) in P, eac ^ °f degree <n , and non- 
linear functions E^(y,u), none of which depend on the parameters 
0, together with a vector function h (0) which is assumed to satisfy 
( 12 ). Model ( 4 ) provides such an example via the equivalent 
vector-matrix representation in (6). 

Needless to say, inexact and nonseparable parameter models 
are inherently the most difficult to handle. The following sections 
will indicate ways of devising an operator B with the desired 
equation error properties for exact and a class of inexact models. 4 
For the most pan these methods will be efficient only for models 
which are separable in the parameters, or models which possess at 
most one or two nonseparable parameters. Letting a denote the 
nonseparable parameters), the model (1) can be referred to as 
partially separable with respect to the parameters (0,0) if it 
admits to the representation (cf. (11)): 

Eiypy p n y,u pur p m u,a, 0 ) 


^^(©^(y^py,- p n y\upur p m u y a). (14) 

i 

Time lag systems with unknown delay parameters provide exam- 
ples of such models. Furthermore, if the model ( 1 ) is both exact 
and partially separable with respect to parameters (a,0) (cf. ( 13 )): 

Eiypy," p”y,u pu>- p m u,a,0) 

( 0 )£; ; ip )Ej (y m ,a). ( 1 5 ) 

«./ 

As before, the vector function h(&) comprised of the /»j( 0 )’ s in 
( 14 ) and ( 15 ) is presumed to satisfy (12). 


3 . Exact Differential and Separable Parameter Models 

Consider the class of exact and separable models represented 
by ( 13 ). In this case there exist at least two different ways of 
defining a linear operator B such that the norm of e ( 0 ): 

e(0)=L /, i ( 0 ) 5 CLPijiPV) O' )) ( 16 ) 

* J 

has the properties of a metric function given certain nondegen- 
eracy conditions on the data [M(f) t y(0], 0 </< 7 *. In general terms, 
nondegeneracy is guaranteed if the set of functions 
ip )Ej (y (t ),« (0)). t=l,2 * , are linearly independent on 

[ 0 t T] and do not lie in the null space of the operator B . Then the 
abstract equation e(0)=0 has a unique solution 0=0* which is the 
desired value for the parameters under ideal noise-free conditions. 
Further, 

min | (0)1 1 provides a least squares estimate under nonideal con- 

ditions. 


4 Also discussed in [ 3 ] is an intermediate class of provisionally ex- 
act system models that reduce to exact s> stems when subjected to spe- 
cial inputs over finite time intervals. 
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3.1. Operator B Via Moment Functionals 

One way to specify an operator B with the desired properties 
is via the classical Shinbrot method of moment functionals, also 
called the modulating function approach, which converts linear 
differential expressions on a finite time interval into algebraic 
equations (see [4]). Using Fourier based modulating functions, 
this approach has been used in Pearson and Lee [5,6] to formulate 
parameter identification of both linear and polynomial input-output 
differential operator models. The basic ingredient here is a vector 
0(0 of modulating functions defined as follows: 

O(i) = Cf(0. 0<;<T (17) 

where f(r) is defined by the (2L + l)xl column vector of commen- 
surable sinusoids 

f (r )=col ^Lcoscoof^incoor; * • ;cosLco 0 / # sinLco ( /j (18) 
0< i<T t co 0 = 2ji/7 

and C is a (2L+l-n )x(2L+l) matrix whose rows are determined 
by the end point conditions: 

O (n (0) = O (0 (D = 0, i =0,1,2 *(n-l). (19) 

Here 0 (,) (r) means p‘0(r). It can be seen from (17) and (18) that 
the time derivatives of 0(f) have the representation 

(-iyO (i) (0 = CD 1 f(/ ), i *0,1 A- (20) 

where D is an operational matrix defined by the block diagonal 


structure: 

D = OJq diag [ 0 . [_ 0 , j] • ' ’ [_° L o) 1 (21) 


To invoke the method of moment functionals for the model 
(1) which possesses the representation (13): Multiply both sides of 
(1) by O(r), integrate over the lime interval [0,7] and interchange 
integration and summation over i, then use integration-by parts 
while noting (19) and (20). The result is the following vector 
algebraic equation which essentially defines the operator B . 


e‘(6)=I MWCPylDWj. (22) 

*./ 

Note that the differential operator p in the polynomials P tJ (p) has 
been replaced by the block diagonal matrix D of (21). The Vj are 
(2L+l)xl vectors defined by 
T 



As discussed in [5,6], these data-related vectors can be computed 
by well known DFT/FFT techniques since they are finite Founer 
series coefficient vectors for the functions vj(t )-Ej(y (t ),u(f)) on 
[07]. The Euclidean norm of e(0) in (22) then provides a metric 
function p(0,0* ) on the space of adjustable parameters, and the 
square of this norm defines a suitable function 7(0) for least 
squares minimization: 

jowiQwm < 24 > 


where the («,/) component of the symmetric nonnegative definite 
matrix Q is given by 

vMtryccPfmv,. ( 25 ) 

i./ 

Hence, the major computations to set up the least squares problem 
in this approach is to calculate the vectors of finite Fourier senes 
coefficients for the data related functions £ y (y(f),i<(r)) followed 
by the inner products involving these vectors in (25) to obtain fi. 
The null space of the operator D is essentially all functions which 
are orthogonal to the sinusoids comprising f(t) in (IX). Hence, 
choosing a sufficiently large integer L shrinks the null space so 


that from a practical standpoint nondegeneracy for the least 
squares problem means linear independence of the functions 
(l/ , i; (p)^(y(0.u(0)l. i=1.2 . on [OTl. However, based on 

finite bandwidth considerations, choosing L large will necessitate 
a smaller ‘resolving frequency' coq and therefore a longer [0,7] 
interval, as seen from (18). These issues are more fully discussed 

in [5], 


3.2. Operator B Via State Variable Filters 

Another such B operator is basically a projected state van- 
able filter (see [7,8]). Its specification can be summarized as fol- 
lows. Let F (s) be a polynomial of degree larger than n chosen 
by the user so that F~\s)P(s) is a strictly proper and stable 
transfer function matrix with the polynomial matrix Pip) defined 
by the P,j(p) in (13). Let the filtered signals z;(r), 0<r£7, be 
defined implicitly by zero state solutions to the differential opera- 
tor equations: 


F(p)Zi (r)=XP,, ip )Ej (y (f ),u (r )). (26) 

i 

In order to obviate dealing with all unknown initial conditions 
define a projection of these signals on [0,7] by the relation: 

T 


h {t)=z i (f)-c 0 e A ‘ VF'’Je AT c 0 'r, <t )d x, 0 <7 


(27) 


where ( A,c a ) is an observable realization for the homogeneous 
equation: F(p)z(f)=0, and IF' 1 is the inverse of the observability 
Gramian for this realization over [0,7], i.e., 

T 

W=je AI c g 'c a e Al di. 

0 

Then the integral squared norm of the following function 

eM)=Pi(e>*<<0 


has the desired metric properties in that the square of this norm 
defines a suitable function for least squares minimization: 


7(6)=/j'(0) 


T 

jz(t)z'(t)dt 

0 


h ( 9 ). 


(28) 


Here z(f) stands for the column vector of functions with com- 
ponents 2 i(t). Hence this specification of the operator B essen- 
tially involves integrating the linear differential equations (26) 
with forcing functions vy(f)=£y(y (0 ,m( 0). performing the projec- 
tions in (27) which strip away the affect of unknown initial condi- 
tions, and calculating the Gram matrix for these projections in 
order to define the least squares function (28). 


3.3, Partially Separable Parameter Models 

Consider the class of models which are exaa and partially 
separable with respect to parameters (a,0) so that the representa- 
tion (15) holds. In this case the least squares functions of (24) 
and (28) will take the forms 

7(a,0)=/t'(0)ft(a)/j(0). (29) 


and 


y(a,0)=/i # (0) 


Jz(f.a)i'(f.a)i/» 


A(0) 


(30) 


respectively. It is seen that the dependence on the nonsep.irable 
parameter a comes through the 1 , vectors lor f), i.e., 
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T 

Vj( a)=f£,(>\u,a)f(r)<*. 

0 

while the filter signals z, (/ ) depend on a via 

z, (i r ,a)=F“ l (p )£Py (p )£ ; O' *«>■ 

/ 

If the source of the nonseparability is a delay parameter, e.g., 
EjiybXuiO&^EjiyWMit-CL))* then as shown in Pearson and 
Wuu (9] for differential-delay equation models, it is possible to 
derive the ‘variable projection* functional using nonlinear least 
squares theory [10] in order to first estimate the ct parameters, 
after which the separable parameters may be estimated as before. 
In this way, it is possible to decouple the difficult- to-esti mate Ct 
parameters from the more straightforward 8 parameters. 

4. A Class of Inexact Differential System Models 
Suppose the model (1) admits to the representation 
E(ypyyP n yM ,“p m u ,0) 

(9)F 0 (y m )Pij ip )Ej (y*) (31) 

where again the P^(p) are polynomials in p of degree and the 
vector function h(Q) comprised of the /i;(0)s satisfies (12). 
Model (7) is an example of this class when expressed in the 
equivalent form (8). It will now be demonstrated that under cer- 
tain smoothness conditions pertaining to the data-related functions 
gijO) defined by 

8ij(t)-F i j(y(t) y uU)), 0 <t<T (32) 

the method of moment functionals of the previous section can be 
used as a kind of ‘integrating functional’ for the inexact model 
(31). In order to do so, it will first be noted that the multiplica- 
tion of any modulating function <J)(r ) of order n over [07] by an 
arbitrary n -times continuously differentiable function g(r) defined 
on (071 is again a modulating function of order n on [07]. That 
is, given that a n -times continuously differentiable function <KO 
satisfies the end point conditions: 

$ (|) (0)=<I> (, ‘ \T )=0, i =0,1 ,2,-(rt -I ), 

the function 0(r)=g (04>(0 satisfies the same end point conditions 
for any sufficiently smooth function g(/) on (07]. Relative to 
the functions g iy (r) of (32), it will be assumed that each such 
function can be represented to any desired degree of accuracy by a 
finite Fourier series, i.e., for a sufficiently large integer M , 

M % A _ 

gij (0= z [« a ij ( m > cosm +b ij ) sm ™ 57 ( 33 ) 

m=0 

where the (o l7 (m )J>ij (m )) coefficients can be computed via well 
known DFT/FFT techniques for each g^t) function. 

Now invoke the method of moment functionals for the model 
(1) satisfying (31) using the Fourier based modulating functions 
defined in (17) and noting the aforementioned property of such 
functions. Thus, multiply both sides of (l) with the representation 
(31) by <J>(/) while using the expansions (33) for each 
pv(y(r),u(f)) function in (32), integrate over the time interval 
[0,T] and interchange integration and summations, use 
integration-by-parts while noting (19) and (20), and take into 
account the trigonometric identities: 

2cosJLx cos/nr =cos(* -m )x +cos {k +m )x 
Islnkx cos mx =sin (k -m )x +sin (k +m )x 
2sinLr siirnr =cos(* -m )x -cos(* +m )x. 


The result is the vector algebraic equation: 5 

em=Zh t mV(iJ)Vj (34) 

* J 

where the matrices W(iJ) and vectors Vj arc defined as follows: 
W(iJ)^^[a ij (rn)Q m ^b lJ (rn)R m ]P tJ (D) (35) 

m= 0 
T 

Vy =jEy (y (r ),u (» ))?(/ )dt. (36) 

0 

The matrices Q m and R m in (35) are quasi-banded structures that 
arise from the interactions between the basis functions in (33) and 
the commensurable sinusoids comprising the Fourier based modu- 
lating functions in (17)-(18). Their precise representations can be 
found in Equations (25) and (26) of Pearson and Lee [11] where 
they arose in a different context. The tilde over the various 
expressions in (34)-(36), in particular D in (35) and f(f) in (36), is 
meant to indicate that the harmonic frequencies in these terms 
extend out to (L-hV/)o) 0 ; again, this is due to the interactions 
between the sinusoids in (33) and Fourier based modulating func- 
tions. The fact that the Fourier series coefficients for these higher 
order harmonics have to be computed is of minor importance from 
a computational viewpoint since the DFT/FFT algorithm will yield 
many more frequencies than are actually retained if a high degree 
of accuracy is employed. (Refer to the discussion in Section 2.2 
of [5].) However, noise in the data is important which may neces- 
sitate a longer time interval for the data, i.e., a shorter resolving 
frequency co 0 =2rc/7\ in order to cut off high frequency noise in the 
higher harmonics. 

The equation error expression in (34) essentially defines a B 
operator with the previously outlined desired properties, together 
with a kind of ‘integrating factor’ property to handle the inexaci 
aspect of the model (31). As before, the Euclidean norm of e(9) 
in (34) can be used to define a positive definite function /( 8) suit- 
able for least squares minimization. 

5. Uniqueness of a LS Estimate for Separable Models 

The single-valued property assumed for the function h{B) in 
(12) makes the statement of conditions for a unique least squares 
estimate more transparent than the conditions attending other 
models and approaches. To elaborate on this a bit more in 

relation to the separable model (11) let the functions v^r) be 
defined by 

^ (r )=£, < y (r )j>y C t)rp n y (/ ).a (* )rp m u (r )) 
and define e(r ,0) by 

€(r,e)=£/ii(0)Vi(O. 

i=i 

Suppose 0* is a value of the parameter vector such that e(/.9* )=0 
for any input-output pair \u(t)y(t)] on (07]. Thus for any par- 
ticular input-output pair leading to a particular sequence of func- 
tions Vi(f): 

e(r.0)=L (e>-*i(0* )] v ( -(r). 

Hence uniqueness of any least squares estimate is predicated on 
linear independence of the data-related functions v,(r). *=l,2 
on [07], In turn, this dictates linear independence conditions on 
the input-output data and its derivatives in order to guarantee 
uniqueness. Of course, it is the projections of these functions via 


5 It is assumed that if not, simply add more modulating 

functions until LZ\1 . 
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the operator B that must be used to test uniqueness since deriva- 
tives of the data are not presumed to be available. 

As an example for comparison with other models and 
approaches, consider a two companmental model from chemical 
kinetics (Example 7.1 in Walter [12]): 

y (r )=-9 x y( ))? )+ w ( f ) ( 37 > 


X i(t )=02( 2^ ))? )“^4 JC 2^ ) 

where ( 0 |, 0 ^ 03 * 04 ) represent parameters to be determined given 
the data [u{t)y(t)] over some time interval. Under the assump- 
tion that y(r) is positive for all t of interest, i.e., y(O>0, 0<r<7 , 
the following differential operator equation can be derived by 
eliminating the state x 2 (t) from the above equations: 



l 0 0 0 0 


y( t) 


p 10 0 0 


-U(l) 

h'(Q) 

n n n i n 


ln(y(0) 


KJ fr l \t 


-u(t)/y(t) 


0 0 0 0 1 


1 J 


+ 


* +1) W 


|p 2 +p,p+l 



(38) 


where the components of the vector function h (0) are defined by: 

/lj=0j0203, ^2 = ®2®3» ^3 = 04» ^4 = (01+®2)^4- 

Bearing in mind the previous discussion for the model (31), first 
approximate the function I/y(r) by a finite Fourier series as in 
(33) with coefficients ( a(m),b(m )) and then replace the pairs 
(a(m),b(m)) in (35) by the pairs 
(a (m )+m 0 )^ (m (m )-m (m )), thereby achieving an 

approximation to the function <p + l)(I/y (0) which plays the role 
akin to g tJ (t) in (32). 

The validity of extending the model (31) to the example (37) 
will require a greater degree of smoothness on the part of the data 
since the differential expression (p + l)(l/y (0) is evaluated in 
terms of the approximation for l/y(/). However, apart from the 
question of approximation, it is noted that since there is a one-to- 
one map between h and 0 (except for a set of measure zero) the 
necessary and sufficient conditions for the existence and unique- 
ness of solutions to the parameter identification problem email 
linear independence of the four signals: 
(y(r), py(t)-u(t), p(\t\y(t))-u(t)/y(t), 1] on [0,7]. This should 
be compared with the ‘generating series’ analyses given in [12] 
for deciding the issue of ‘structural identifiability’ of the model 
(37). That is, not only is the issue of ‘identifiability’ resolved by 
inspection of (38), but the conditions for the uniqueness of a least 
square estimate can be related to linear independence of the pro- 
jections of the appropriate signals. 

6. Conclusions 

Input-output nonlinear differential operator equation models 
have been classified according to the calculus notion of exactness 
for differential expressions. It has been shown that exact 
differential models can be integrated to yield a positive definite 
function suitable for least squares parameter identification given 
the I/O data on a finite time interval. This is not possible for 
inexact differential models. However, under a smoothness condi- 
tion on the data, a kind of 'integrating functional’ has been found 
via the classical method of moment functionals that permits 
integration of the parametrized equation error for a class of inex- 
act models. The formulations also facilitate determining condi- 
tions on the I/O data that guarantee uniqueness of a least squares 
estimate. 
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Abstract 

If a specified set of parametrized nonlinear state differential 
equations possesses an equivalent input -output differentia] opera- 
lor nv»fr»i, then the latter model has the potential for deciding the 
issue of the identifiability p rope rt y of the former model in a 
straightforward mann er. This property is discussed in relation to 
p r oper ty of wcU-poscdness for the least squares parameter 
identification of a class of input-output models that are separable 
in the parameters. 

Introduction 

yttte differential equations of the form 

ymhfrjifi) O) 

constitute the most commonly used starting point for investiga- 
tions i rtA the parameter identification of deter m i n istic nonlinear 
differential systems. Given the input-output data [u(f)o'(0] on 
some time interval 0£r£7\ together with the functions / (x ,« ,0) 
and A (x , 11 , 0 ), a number of methods art available for estimating 
the parameter vector 0 such as quasilinearization, invariant 
embedding and the like. A theoretical issue considered by 
several investigators, e.g [1-3], is that of the “identifiability’ 1 of 
tike model (1). Although the definitions vary somewhat from one 
investigator to another, failure of the model (1) to be identifiable 
usually means a kind of over-parametrization or degeneracy in 
ti*yt it cannot be demonstrated that there exists a suitable input 
signal u(r) and/or initial state x^CO) such that the integration 
of the state equations corresponding to any two distinct parame- 
ter vectors 0« and 0*, P*** rise to solutions y.(0 and 

y*(f) that can be distinguished from one another on [0,7]. An 
equivalent input/output differential operator model of the generic 
form 


£ (y *y rp n y # *> u rP m * a) 

where p denotes the differential operator dldx so that p 2 ^ 2 /^ 3 * , 
Etc K»t the potential for this deter m i nati on more Iran- 

recent provided *ucb re equivalence exist*. 1 A simple example 
fl lustnring this situation is the scalar bilinear system (Example 2 

In PI): 

i«l+x+(l+x)i< 

y ■€(!+*). 0) 

which has the equivalent input/output (I/O) relation 

ywy+yu. W 


2 The motivation for this study stems from the development in [4] 

of s least squares parameter estimation algorithm which is applicable to 

ctnain s ubclasses of the generic model (2). 


The latter model clearly fails to be identifiable no matter what 
definition is employed since it is completely independent of the 
parameter 0. A less obvious example is the following second 
order system (this is Example 1 m [3]): 

if*? I ■**?**" 

*ar®>*?+Vj*2 < 5 > 

yror,. 

If it is ats p^^d that y(f) is of fixed sign for all fE[0,7"]» e.g., 
y(r>0, 0 £r£T, then (5) has foe following equivalent I/O rela- 
tion: 1 

where p (lny )“y /y • The identifiability property of this example 
(actually foe nooidentiiUbility) will be made evident after a dis- 
cussion of a subclass of foe model (2) which includes (6). 

Separable I/O Models 

Consider foe generic model (2) which is separable in foe 
parameters, Le., there exist scalar functions g<(0) and 
£*(y &y *'P*y *"P m such that the function £ 
in (2) admits to the representation 

£ (y w rp*y # rp M * .0) 

<-l 

For rey Input-output pur [eCr^CO] on fOJ], kt the functions 
v*(f ) be defined by 

V(<r)-£ 4 (y (0) (*) 

lm\3r* x 

and define e( 8,0 by 

«<e^>*2ili( e )Vi(0. (9) 

i-i 

If 0* is any value of the system parameters distinct from 0 and 
[»(')?(')] k w y valid input-output pair, then the function 
8(0,0* ,0 defined by 

Kfl.evKe/W®*-') (io) 

-X [r<(6)-»i(®*)]v<(») 

im 1 L J 

qske s clear the necessity of the singk-valuednes* of the vector 
function g (0) comprised of the gj(0) f s, Le., the condition: 

f (©)■* (0* ) if and only if (W 

is necessary else any parameter estimation scheme based on the 
I/O model (2) which is separable in the parameters according to 
(7) is doomed to failure. This is clear from the consideration 
that 8(0,0* *r) can provide for the basis of a metric function in 
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fe panmeter (pace only if (11) bolds. 3 Stated in an alternative 
way any least squares estimation scheme which is based on 
input-output data for the model (7) will not be well-posed if (11) 
fails. 

Example 2 Consumed: The (fc-Ej) for the model (6) can be 
defined as 

*4«®4 ( 12 > 

£ l ^» 1 (lny>-p(j). Efjry, E,*y\ E^-py. (13) 

Since both I/O models (4) and (6) possess g (0) functions that 
fail to satisfy condition (11), Le. g(0) is degenerate in (4) while 
(12) makes evident die over-parsmetriiation of (6) with respect 
So the (62.6s) parameters, there does not exist an input function 
«(r) which would facilitate a unique deter min a t ion of the 6 
penmeten given the input-output da t a. 3 

Assuming that condition (11) holds for an I/O model of the 
generic fora (7), it seems reasonable that the least squares esn- 
nation problem is well-posed. However, the question of unique- 
ness of an estimate is data -dependent and is more difficult to 
answer, especially when only input-output data (no derivatives) 
fe available. The following example illustrates this point (This 
is Example 7.1 in [1].) 

Exaiyle 3: A chemical kinetics model is defined by 
i i«-0i* j)*t + “ 

i3^2(l“0j jr 2 y* l“®4*2 

sntTn mating the Bate x 2 , the equivalent I/O model is found to be 


The above function g (0) is single-valued and moreover is one- 
to-one (except for a aet of measure nero); hence, the least 
aquares problem is not only well-posed but it is also basically 
in ^ $sg (0) parameters. A least squares estimate will be 
unique only if the corresponding regressor functions are linearly 
independent on [0,7] which for this example means linear 
independence of the functions: 

v,(rH>(r). Vj(r)-y(rH<(f) 


Concluding Remarks 

The above examples make clear die necessity of a single- 
▼tluedness property of «n equivalent input-ouput differential 
operator model is testing the identifiability of the state equation 
model (1) for the clast of I/O models is (7). Stated another way, 
the least squares parameter estimation problem will not be well- 
posed for this class unless property (11) holds. The conclusion 
leached in each such example is consistent with the results 
obtained by the cited investigators but seems easier to reach 
when an equivalent I/O model exists. The question of unique- 
ness of an estimate will depend on linear independence of the 
data -dependent regressor functions. The latter question is more 
difficult to answer owing to the necessity of projecting these 
functions down into a subspace (using for example state variable 
filters) in order to obviate dealing with unknown initial and 
boundary conditions. 
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p 2 Qny Yp (j >+ 0,020 w^frilpy -u ) 


*04 



0j+02)64«O. 


Here the representation (7) holds with 

gj-«,020j. gf* 20J- i?**> *4-<0i*02>04 

£,-y. Ew*. EjT>(biy)-A £ > 1 - 


2 A phyncally realiiabk metric fttnetion p(5,6* ), which is 
duncterized by coy particular input-output psii [«(r),y(0] on [0,7], 
can be defined only if e way eta be found to overcome tbe necessity or 
dealing with the derivatives of the data; *uch a way is discussed in [4] 
through die use of projected state variable filters tr moment fuocoonals 
for gp fyjf i subclasses of the mode! (2). 

* Nom dentifiab ility of these examples was also conclud e d by the 
i io [2] and [3). 
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Abstract. The parameter identification of linear differential systems is consid- 
ered from the viewpoint of Shinbrot’s classical method of moment functionals using 
commensurable sinusoids as the modulating functions. This facilitates a least 
squares formulation in which the underlying computations require calculating a finite 
set of Fourier series coefficients of time limited input-output data while avoiding the 
necessity to estimate unknown initial conditions for a one-shot estimate, or un- 
known boundary conditions at each stage for sequential least squares. It is noted 
that a fast Fourier transform algorithm can be utilized for these calculations, thus 
providing a “fast algorithm” for the identification of continuous- time systems. It is 
shown that the frequency domain interpretation can be useful in enhancing the 
signal to noise ratio of the modulated data in the presence of noisy measurements. 
A maximum likelihood estimate is developed for the stochastic case of additive 
white gaussian noise in the data which effectively removes the bias when the 
parameter identification is considered in a recursive mode. Simulation results are 
included to illustrate the developments. 

Key Words — Parameter identification, continuous time linear differential systems, 
least squares estimate, maximum likelihood estimate, Fourier modulating functions, 
fast Fourier transform algorithm. 


1. Introduction 

The identification of linear differential systems can be undertaken in a 
deterministic vein using the classical steady state frequency domain approach 
for estimating the system transfer function, or using a variety of methods based 
on a differential equation model in the time domain which would include 
quasilinearization, state variable filters, model reference techniques and adapt- 
ive observers (Young’s survey, 1981). In a stochastic vein, the known methods 
would include generalized least squares, instrumental variables, maximum 
likelihood and extended Kalman filtering techniques (Young’s survey, 1981; 
Astrom, 1981). The deterministic methods are computationally simpler but may 
incur significant biases in the presence of noise. The stochastic methods, while 
promising to remove the biases asymptotically, are computationally demanding 
to a degree that they are more likely to be found discussed and used in a 
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discrete-time format. Although a continuous-time model can be constructed 
from a discrete-time counterpart, the methods for accomplishing this are 
unwieldy and potentially unreliable due to such difficulties as nonuniqueness. 
Hence, a direct attack on the problem is clearly preferred if a continuous-time 
model is desired. Such is presumed to be the case here. 

Another classical method that can be applied to linear differential systems is 
Shinbrot’s method of moment functionals, also called the modulating function 
approach, which facilitates converting a differential equation on a finite time 
interval into an algebraic equation in the parameters (Shinbrot, 1957). In this 
regard let denote the differential operator d/dt so thatp 2 * d^ldt?, etc., and 
consider the following nth order differential equation model relating an input- 
output pair (u(t), y(0) for a single input-single output system: 

2 o a*_, p'y(t) = >„_,/>'«(/), flo = 1. (D 

As introduced by Shinbrot, <p(t) is a modulating function of order n relative to a 
fixed time interval [0, T) if it is sufficiently smooth and possesses the property 
that 


<p (i \ 0) = <p w (T) = 0, » = 0,1 («-l), (2) 

where <p (i) U) means />’0(f), i-e., <p(t) and its first (n-J) derivatives vanish at 
both end points of the time interval [0, T\. The significance of this property for 
system identification stems from the fact that if (w(/), y(t)) is presumed to 
satisfy the model (1) on [0, 7] then the multiplication or modulation of both sides 
of (1) with <p(t) followed by integration-by-parts over [0, T], while noting (2), 
leads to the relation, 

Z.(-l)' o*-i f y(t)<p (i) (t)dt 

,c ° Jo 

= %\-iy b„_ f u(t)<p (i \t)dt, fl 0 = I- (3) 

Furthermore, if {&-(*)}, »' = 1, 2 K, is a set of linearly independent 

modulating functions of order « on [0, T], a vector algebraic equation results 
which can be used to obtain a least squares estimate of the parameters (a„ b,), 
i = 1, 2, . . . , n, provided some nondegeneracy conditions are upheld. It is noted 
that the prime reasons for using such modulating functions are to avoid 
differentiating the data and to avoid estimating unknown initial conditions for 
time limited data. 

The above idea has been pursued using modulating functions which stem 
from Hermite polynomials, as in Takaya (1968), the Poisson process, as in 
Fairman and Shen (1970) and Saha and Rao (1979; 1980; 1982; 1983), spline 
type functions, as in Maletinsky (1979), and trigonometric or Fourier type 
functions, as in the authors’ (1983) + . However, with the exception of Shinbrot 
(1957), Maletinsky (1979) and the authors' works (1983; 1985), it will be found 


t The extension to other models has also been considered such as linear time varying systems with 
polynomial coefficients (Loeb and Cohen, 1965; Fairman and Shen, 1970; Saha and Rao, 1983) and 
certain types of nonlinear models (Shinbrot, 1957; Saha and Rao, 1983; Pearson and Lee, 1985). 
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that certain anomalies exist in the other formulations cited in that either a very 
long time interval is assumed, or the initial conditions are constrained in some 
way, etc. Notwithstanding these anomalies, the modulating function method has 
remained relatively obscure due in large measure to the rather severe computa- 
tional burden associated with the linear functionals on the data in (3). Thus, with 
the current emphasis on recursive methods in system identification, computing 
these linear functionals at each stage appears cumbersome unless some type of 
“fast algorithm” is available. Since the fast Fourier transform (FFT) is such an 
algorithm for the discrete Fourier transform (DFT), and since the latter can be 
used to calculate the Fourier series coefficients of a time limited signal with 
great accuracy, it appears as though the modulating function approach should be 
re-examined for continuous system identification using Fourier based modulat- 
ing functions. Although some aspects of the formulation have been developed in 
the works cited earlier, a number of issues remain to be examined even in the 
linear case. These include the handling of noise, structure determination 
procedures, optimal inputs for parameter identification, as well as the experi- 
ence to be gained via simulation studies. 

The contribution of this paper is to show how the frequency domain 
interpretation can be used to advantage in ameliorating the effects of random 
noise even in a deterministic setting and, further, to develop a maximum 
likelihood estimate within the context of the modulating function approach in 
order to eliminate the bias in the sequential least squares estimate when the 
data is corrupted by additive white gaussian noise. The maximum likelihood 
estimate will be a modification of the Levin (1964) algorithm for discrete 
systems identification, later analyzed in detail by Aoki and Yu (1970 a; b), 
tailored to fit the formulation of this paper for differential systems identification. 
Computer simulations will be presented to illustrate each of the developments. 

2. Least squares estimate 

A deterministic least squares estimate is formulated in this section given the 
input-output data (u(t), y(t )) over a fixed time interval [0, T ] for a one-shot 
estimate, or over a sequence of time intervals [/,, fc+i], i = 0, 1, ..., each of 
duration T, for a recursive estimate. Consider the set of commensurable 
sinusoids {cosma) 0 t, sinmai 0 f}> *« = 0, 1, M, where o) 0 = 2 n/T plays the 
role of a “resolving frequency” in the identification problem. This role will be 
discussed in the section on computational considerations along with guidelines 
for choosing (T, M). It could be expected that for a specified order n in the 
model (1) there can be found any desired number of linearly independent 
modulating functions simply by choosing M sufficiently large and subjecting 
appropriate linear combinations of these sinusoids to the end point conditions 
(2). Although there are many ways of doing this, a systematic procedure is to 
separately form linear combinations of functions from the two sets {cosma) 0 f} 
and {sinmwoO. m = 0,1, 2, .... M, and subject each combination to the end 
point conditions (2), the only stipulation being that M satisfy (2Af + 1 - n) > 0. As 
shown in the Appendix, this (offline) procedure leads to (2Af+l-«) linearly 
independent modulating functions, <p t (t), i = 1, 2, (2M+1 — «), which can 

be put in the vector-matrix representation: 


<?>(/) * Cf(t), 0 


(4) 
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where f(,t) is defined by the column vector of sinusoids: 

/(/) = col [1, cosa) 0 /, sincooif, cos2w 0 f. sin2(u 0 f, 
cosMa>ot, sin3/a> 0 f]> 

0 o)q = 2 jiIT, (5) 

and the rows of the (23/+l-w)x(2M+l) matrix C are determined by the 
solution to Vandermonde type matrix equations. Further shown in the Appendix 
is the important fact that the matrix C has full rank. The role of this matrix as a 
projector for the least squares problem will be clarified below. Since by 
construction <P(t) is a vector of modulating functions of order n it satisfies the 
end point conditions (2): 

<P (0 ( 0) = <P m (T) = 0, i = 0, 1, 2 (n-1). (6) 

It can be seen from (4) and (5) that the time derivatives of <t>(t) have the 
representation, 


(-1 y<P ( Ht) = Cirm, * = 0,1,2 


(7) 


where D is an operational matrix defined by the block diagonal structure: 


n _ ,• L r 0 ll r 0 2l 0 Af 1 1 /£} , 

D o)q diag j^O, 0 j, |^_£ 0 J \_-M 0 J J' 8 


Multiplying both sides of the model equation (1) with #(/) followed by 
integration-by-parts over [0, T], while noting (6), there results the vector 
analog to (3): 

i (~iy f T 0 M (t)y(t)dt 

= 2 1 (-1)' b ^ f a 0 = 1. (9) 

1=0 Jo 

Taking note of (4), (5) and (7), the preceding equation is equivalent to 

t CD'Y = S 1 CUV, fl 0 = 1. (10) 

1*0 1*0 


where (U, Y) represent finite Fourier series coefficient vectors of the data 
defined by 


f T 


u(t)f(t)dt. Y 


- f 


y{t)f(t)dt. 


(ID 


Rearranging (10) in terms of the parameter vector 6 defined by 


6 = col [ 0 \ ... -a„, bi ... b n ] 


( 12 ) 


and the partitioned matrix f defined by 
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r = [D”~ l Y ... Y, D^U ... t/], 

there results the least squares regression equation in standard form: 

ere = CD”Y. 

Hence, forming the normal equations for (14) and assuming that CT has rank 
(2 «), the one-shot least squares estimate is given by (prime denoting transpose) 

0 «= [r'C'CTT'r'C'CD'Y. (15) 

Here, it is assumed that a sufficient number of modulating functions have been 
chosen so that 
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(13) 

(14) 


(2M+l-n) > 2n, i.e., 2M > 3n-l (16) 

else there are fewer algebraic equations than unknowns upon which to base the 
one-shot estimate. 

In the case of sequential least squares, (14) is replaced by 

cni)6 = CD”Y(i), i = 0, 1 , 2 (17) 

where at each stage the underlying quantities are supplied by the finite Fourier 
series coefficient vectors of the input-output data taken over sequential time 
intervals [f„ /,+j], i = 0, 1, 2, .... each of duration T. Standard sequential least 
squares theory can then be applied to (17) in constructing a recursive solution 
(0(t)}. (See, for example, Mendel (1973).) In this case the number of 
modulating functions need be chosen subject only to the previously stated 
inequality: (2Af -t- 1 — «) >0, c.f. (16). 

2.1 Aspects of the least squares estimate The parallelism between the 
differential equation model (1) and the algebraic equation derived in (10) may 
suggest the interpretation that the latter is a transformed version of the former. 
However, it is important to underscore the facts that (i) (U, Y) ire finite Fourier 
series coefficient vectors extracted from the infinite dimensional transient data 
on [0, 71, and (ii) the matrix C plays the role of a projector on the finite 
dimensional space to which the computed vectors (U, Y) belong because this 
matrix acts to "strip away” the explicit influence of the unknown initial and 
boundary conditions, i.e., without this matrix the unknown boundary conditions 
on derivatives of the data would have to be appended to (10). These two 
projection aspects can be used to distinguish the Fourier based modulating 
function approach from other methods*. 

Conditions on the initial data and forcing function for (1) which would 
guarantee the existence and uniqueness of the one-shot estimate in (15), or 
convergence of the recursive estimate based on (17), have not yet been 


t An alternative method which employs state variable filters and a projection operator to annihilate 
initial condition effects of an integrated equation error function on [0, T] is discussed in Pearson 
(1976). Comparatively, it is believed that the Fourier based modulating function approach is 
superior due to the computational advantage of employing the FFT algorithm and the potential for 
ameliorating noise. 
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determined but will surely involve the notions of minimality of the model, 
excitation of system modes, and the like. As necessary conditions, it is 
straightforward to show that the system must not be in steady state operation 
and that the input must not be a linear (zero memory) feedback on the output, 
else some columns of CT will be linearly dependent. Likewise, each of the 
Fourier coefficient data vectors (U, F) must contain at least n nonzero 
components, i.e., at least n frequencies must be present, else linear depen- 
dence will occur. In this connection it can be noted from (10) and (13) that order 
determination for the model (1) (within the context of the Fourier based 
modulating function technique) relates to finding the smallest integer n for which 
the columns of the following matrix become linearly dependent: 

C [D H Y, D”' l Y, .... Y, D"- l U, .... U ]. 

Procedures for testing such linear dependence will also reflect back on condi- 
tions involving the initial data and forcing function for (1). Hence, effective order 
determination procedures and conditions for the existence and uniqueness of the 
least squares estimate are important interrelated topics for future investigation. 

2.2 Computational considerations The choice of ( T , M) can be guided in 
reference to the amplitude plot of a system transfer function sketched in Fig. 1. 


Amplitude 




Fig. 1. A frequency domain interpretation. 
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As a first consideration, which is based on the desire that the frequencies 
retained in the pair (U, Y) should cover the system bandwidth while excluding 
higher frequency noise, it is clear that the highest frequency in the modulating 
functions, Mw 0 , should be comparable to the system bandwidth W c , say 25% 
higher. Assuming W c is approximately known, a quantitative statement of this is 

Mco 0 as 1.25W C . (18) 

Likewise as shown in Fig. 1, the value of T should be chosen sufficiently large so 
as to assure reasonable resolution in distinguishing system modes of the 
transfer function, via the resolving frequency w 0 = 2n!T. However, this 
consideration is rather qualitative. A more quantitative measure of choosing T 
can be based on the bandwidth relation (18) and the number of algebraic 
equations formed from the modulating functions, i.e., (2M+l-n), together with 
the total number of such algebraic equations one wishes to use in constructing 
the least squares estimate. Thus, considering the case of a one-shot estimate, if 
it is desired that the number of such algebraic equations should approximately 
equal double the number of unknowns, then M should satisfy (2M+l-n)=4n 
which together with a> 0 = 2 n!T and (18) implies the relation (2 n unknowns 
presumed): 



Based on approximate knowledge of the system modes and a priori 
knowledge of contaminating noises in the frequency domain, it will be advan- 
tageous to be more selective in choosing the frequencies used in defining the 

modulating functions t . If m,<u 0 , i = 1, 2 M represent such frequencies 

where the m, are integers satisfying m,<m,+ i and M satisfies (2Af-n>0), the 
formulation is easily modified to reflect this selection by changing/(f) in (5) to be 
defined as . 

f{t) - col [cosmxWoL sinw^of cos sinm*^*] (20) 

and redefining D in (8) as 

*— «■* [ U"o']]- < 21 > 

The simulation results will illustrate the advantage of this flexibility*. 

The most important computational aspect of the Fourier based modulating 
function approach is the direct frequency domain interpretation afforded by the 


t For example, the zero frequency can be deleted if there is an unreliable DC value in the 
measurements. 

t It should be noted that the simple structure of the operational matrix D in (8) or (21) makes easy 
the computing of powers D\ 0 <t<«, as required in (13)-(14). Although alternative rearrange- 
ments of the sinusoids comprising fit) in (5) or (20) are readily accomodated by rearranging 
columns of C, any such rearrangements will generally alter the convenient block diagonal 
structure of D. 
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vectors (U, Y) and the efficiency with which these vectors can be computed by 
an FFT algorithm. In order to clarify this point, let z(t) denote a data function on 
[0, T] and assume uniform sampling in generating the discrete samples 
Zi = z(ih), h = TIN, i = 0, 1, .... N. Then (5) and (11) imply determining the 
following integrals (complex form): 


/' 
J o 


z(t)exp(jmw 0 t)dt, m - 0, 1 M. 


Although the real and imaginary parts of the above integral can be evaluated by 
passing z(t) through a bank of appropriately tuned harmonic oscillators, greater 
flexibility is offered by using well known digital approximations. For example, 
the standard parabolic rule yields 


J 
^ 0 


z(t)exp(jmco 0 t)dt 


Zo+z N + 4 


w-l 

2 


jv— 2 

z,W m - + 2 2 


_ z,W' 

1 = 2 , 4 ,...* 


1 


+ 0(h*), 


where W = exp(j2nlN) and o(-) is the order of the error as a function of the 
sampling interval h. Assuming N is a power of 2, the usual FFT algorithm can be 
used to evaluate the DFT of the sum on the RHS of the above yielding the 
Fourier series coefficients for m = 0, 1, .... (JV-1), i.e., 

Z = -|-FFT ^ Zo + z N , 4z u 2 z 2 4^., j. (22) 

The computational savings of this algorithm for large N are well known. 
However, a special FFT-type algorithm can be devised in consideration of the 
fact that only M Fourier coefficients are needed and that N is likely to be chosen 
much larger than M for good accuracy in the approximation. The efficiency of 
such a partial FFT algorithm can be shown to be log 2 A//A/ (Markel, 1971; Lee, 
1984). 

2.3 Simulation results for the one shot LSE 

2.3.1 Low pass system Consider first the low pass system defined by a 
Butterworth filter with bandwidth 5 [rad/sec] (see Fig. 2(a)) and the transfer 
function, 

Hl< ' s) = s 3 + 10s 2 + 50s + 125 ’ (23) 


The objective is to identify the four unknown system parameters 6 = {10, 50, 
125, 125} using the one-shot least squares estimate (15) based on time limited 
input-output data over a T - 2ji [sec] time interval with the initial conditions 
arbitrary and the input drawn from a colored gaussian random signal generator 
with bandwidth 5 [rad/sec]. A sample of the input, the corresponding noise-free 
output, and the output contaminated by an additive white gaussian noise with a 
20% RMS noise-to-signal ratio is shown in Figs. 2(b), (c) and (d), respectively. 
The Fourier series coefficients for the first M modes in the data were calculated 
using the first M components of the parabolic approximation (22) with either 
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Fig. 2. Least squares identification of H i(s). 


N = 256 or N = 512, as indicated in Fig. 2(e) f . Since the number of unknowns is 
4, the minimum number M needed for a one-shot estimate is^ based on 
(2M+1 -m)> 4, i.e., Af>3. Due to th e random nature of the signals, ten 
separate runs have been ma de for each M from 4 to 13 and the results averaged 
to yield the curves plotted in Fig. 2(e). The normalized error criterion for the 
estimated parameters is defined by 

l 


**|- 



2 

■ 100 % 


(24) 


where 6* is the true parameter value and K is the number of parameters. 


t The 1MSL (1982) was used to provide the integration routine (DVERK) (or generating the 
"continuous” data and as the source for an FFT algorithm to compute the DFT’s of the “sampled” 
data. 
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Except in the noise-free data case, which gives essentially zero error for any 
3 and either N = 256 or N = 512, the results summarized in Fig. 2(e) show 
that a minimum error is reached around M = 6 or 7. This is explained by the 
consideration that as M increases from the minimal value needed to satisfy the 
modulating function constraints (2), i.e., M = 3, the estimation error decreases 
because more information in the data is used until somewhat beyond the system 
bandwidth at which point more noise than signal will be picked up causing a 
deterioration in the identification accuracy. Taking account of the Bode plot (the 
dotted curve in Fig. 2(e)), the results are consistent with the frequency domain 
interpretation, although the identification scheme is formulated in the time 
domain. The difference between the N = 256 and N = 512 curves in the noisy 
case reflects the increased accuracy due to a smaller sampling interval h. 

2.3.2 Band pass system Consider as a second example the band pass 
system with the transfer function, 


H 2 {s) 


14.0625s 2 

s 4 + 5.325s 3 + 189.84s 2 + 466.17s + 7724.7 ' 


(25) 


This system was excited with a colored gaussian random signal of bandwidth 15 
[rad/sec] to obtain input-output data over a T = 2n [sec] time interval with 
arbitrary initial conditions. The Bode plot and an example run of the input, the 
output, and the noise corrupted output signals are shown in Figs. 3(a) -(d), 
respectively. The RMS noise-to-signal ratio of the additive white gaussian noise 
at the output is 20%. 

The simulation results summarized in Fig. 3(e) used two groups of modulat- 
ing functions comprised of frequencies defined by the following: 

Group One: {0, 1, 2 M } for M = 11, 12, ..., 17. 

Group Two: {M, M + 1, ..., 14} for M = 0, 1 9. 

In each case the curves have been obtained by averaging the results over ten 
runs. The error criterion is the same as in (24) relative to the parameter vector 
6 = {5.325, 189.84, 466.17, 7724.7, 14.0625}. The results for Group One are 
similar to those of the first example and can be explained analogously, i.e., the 
estimation error goes through a minimum as M increases from the value M = 11 
(below the bandwidth) to the value M - 17 (above the bandwidth). The results 
for Group Two can be explained on the basis that as M decreases from the value 
M = 9 the estimation error decreases because more information in the signals is 
utilized by the least squares estimate. However, the curves for both N = 256 
and N = 512 flatten out at the minimum because nothing is gained by including 
modes below the pass-band of Hence, the combined results are 

consistent with a frequency domain viewpoint in that the most significant 
information in the data has a band-pass nature like the system transfer function. 

2.3.3 Nonminimum phase system Consider as a third example the non- 
minimum phase system with the marginally stable transfer function, 
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(d) 
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Data from ten runs averaged 

x Group one: 512 Samples 
+ Group two: 512 Samples 
□ Group two: 256 Samples 


+ + + 


Mode M and frequency axis 
(e) 

Fig. 3. Least squares identification of H 2 (s). 


0.5s 3 + 2s 2 + Os + 1 


s* + Os 3 + 2.5s 2 + Os + 0.5625 


which has imaginary axis poles at (±/0.5, ±/1.5) and zeros at (-4.118, 
0.059±/0.695). The objective is to identify the parameters defined by: 6 = {0, 
2.5, 0, 0.5625, 0.5, 2, 0, 1}. This system can be expected to be more difficult to 
identify owing to the larger number of unknown parameters, i.e., 8 for (26) 
verses 4 for (23) and 5 for (25). 
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Table 1. Least squares identification of H 3 (s) 


Input 

u{t) 

[0, T) 

DFT 
Order N 

( 4-2 (8,- e ,) 2 7 


[0, 2n] 

1024 

0.06 

O.Olf 3 


512 

0.02 

[0, 4ji] 

256 

0.33 



128 

1.65 



512 

0.07 


[0, 2 n] 

256 

0.07 

DC 

128 

0.08 

+ 

Four 

sinusoids 


64 

0.26 


512 

0.003 

[0, An] 

256 

0.003 


128 

0.004 



64 

0.017 


The simulation results summarized in Table 1 illustrate some of the trade- 
offs between frequency content of the input signal, the length of the time 
interval [0, T), and the order N of the DFT used in the parabolic approximation 
(22) for the one-shot identification of (26) under noise-free data conditions*. In 
the case of the single mode input signal, u(t ) = 0.01/ 3 , over a (0, 2n] time 
interval, the identification accuracy is acceptable at ||.A0||sO.O6, but at the 
expense of a rather high order DFT (N = 1024). Doubling the time interval to 
[0, 4 jt], thus having the resolving frequency (to 0.5 [rad/sec]) results in better 
accuracy for a lower order DFT (N = 512). The corresponding results show yet 
better accuracy when the multi-modal input signal is used consisting of a DC 
(constant) plus four sinusoids. In this case acceptable accuracy is attained for 
lower order DFT’s, especially when the resolving frequency is 0. 5 [rad/sec]. 
Notice that the guideline for T given by (19) yields: T=4nn/W c = 8;r, which is 
conservative in light of the results obtained for this example. 

3. Maximum likelihood estimate 

If noise in the data cannot be effectively blocked by a judicious choice of 
modulating function frequencies then the least squares estimate may incur a 
significant bias, as is well known in regression analysis. In this case the equation 
error for (14), which is represented by 


t The number of modes M in the vector of modulating functions (4)-(5) was also varied from M = 7 
to 11 in order to cover the system bandwidth (W c &2 [rad/sec]). However, this affect on the 
identification accuracy is secondary under the noise-free conditions and hence is omitted from 
Table 1. Note the error criterion for A6 in Table 1 is slightly different than (24) for this example. 
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will not be negligible. A maximum likelihood estimate will be developed in this 
section 1 which is aimed at the asymptotic removal of the bias for the situation 
depicted in Fig. 4. Here the measured input-output data is denoted by (x(/), 
y(t)) respectively, while the noise-free input-output signals are designated by 
(«(/), w(t)). The noise signals «(0) corrupting the pair w(t)) are 

assumed to be uncorrelated zero mean stationary gaussian random processes. 
Under these circumstances the formulation of the preceding section, which led 
to the algebraic equation (10) for data over a single [0, T\ interval, can be 
rewritten as follows for data given on K nonoverlapping time intervals, [/„ 
i = 1, 2, K, each of duration T (c.f. (10)): 




X(t) y(t) 


Fig. 4. Signals for the maximum likelihood estimate. 


t The algorithm of Levin (1964) for discrete systems identification, originally due to Koopman 
(1937) and analyzed in detail by Aoki and Yu (1970 a; b), will be modified to fit the formulation of 
Sec. 2. 
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vectors for the noise-free input-output, signals («;(/), u(0), t,<t<t, + T. 
Similarly, (X it K ( ) and (V„ N,) will denote the Fourier coefficient vectors for 
(x(f), y(t)) and (v(t), n(t )) respectively. The matrix-vector equation (28) will 
henceforth be denoted by 


G K 'p K = 0. (29) 

With (U„ W ( ) in the column vector p K of (28) replaced by (X„ Y,) (or (V„ N,)), 
this new column vector will be denoted as q K (or r K ). From Fig. 4, it is easy to 
see that 

r K = (30) 

Since (v(t), w(/)) are assumed to be zero-mean gaussian processes, the Fourier 
coefficients vector r K in (30) is a gaussian vector with a covariance matrix 

R k = co v(r J[ ) 

which can be calculated as follows by assuming the auto- and cross-correlation 
functions of the noise processes are known: 


E[(c'DV i )(N j 'D'c)] 

iJi+r /»<,+r 


- c ' de \ r 


/ n{s)f(s- 


tj)ds 


•D'c 


/ », + r r> tj+T 

f(t-t i )R v , n (t,s)f'(s-t j )dsdt 


D'c. 


(31) 


where 


R, ,.n(t,s) = £[r(/)«(s)]. 


As a special case, when (v(t), n(t)) are uncorrelated white gaussian processes 
and f, + 1 -/,>T, the evaluation of (31) can be greatly simplified. By assuming 
that R k is invertible, the conditional probability density function can be written 
as follows; 

_ J . 

P(Qk\^,G K T p K = 0) = const | | ^;xp(--±- 1| ||£,) 
where | is defined by 

| = col[-l, -«! &„]. (32) 


Then the maximum likelihood estimate of 6 can be obtained by 


subject to 


|nj n ll?/f — /vll^-i > 

G K 'p K = 0. 


(33) 


Using a Lagrange multiplier vector A, the above constrained optimization 
problem is equivalent to 


.35 


Parameter identification of differential systems 257 

min i (|| q K -p K || } +A ’G K 'p K ). 

S'Pk'A * 

Taking as a first consideration the partial derivatives with respect to p K and A, 
and equating them to zero, the result is 

(?*-/>*) + £* A = 0, G K 'p K = 0. 

The preceding equations yield 

A = 2 (G/R K G K r l G g 'q K 

p K = q K -R K G K {G K 'R K G K y'G K ’q K (34) 


by assuming that ( G K 'R K G K ) is invertible. Then, it follows that 

II Qk Pk II = (Qk~PkYRk 1 ^Qk~Pk) 

= (<lK'G K )(G K 'R K G K )- l (G K 'q K ) 

= II G%' Qk || » 

where the second equality is due to (34). Thus the optimization problem (33) is 
reduced to 

min|mize||G A '« JC || ( c ([ .je Jt c*)->- (35) 

Although (35) is a nonlinear optimization problem for which the global 
optimum is in general difficult to find, an iterative algorithm will be suggested to 
find the local optimum by observing the following fact: If the weighting matrix 
(' G K 'R K G K )~ 1 is temporarily substituted by a given constant matrix W, the 
problem is equivalent to minimizing a quadratic cost. More specifically, to 
minimize with respect to £ the quadratic form, 

II G/fcIL*. (36) 

the vector G K 'q K can first be rewritten as follows (c.f. (28) and (13)); 

’c l , D"Y 1 ci'z>"- i Ti ... c'jrrir-i 
c 2 'D n Y i c 2 'D"- 1 Ti ... c 2 'X 1 ~a x 

G K 'q K = ; : : ; 

c m 'D”Y K c m 'ir- x Y K ... cjx K \[b n 

CD n Y\ CD”- 

CD”Y k CD* 

CD”Y i CA 

CD”Y K cr K 
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6 = G 22 'Q 21 , (39) 

since the symmetry of W in (37) implies Q 2 \ = Q\ 2 '. Note that for W = /, 6 in 
(39) minimizes the squares of the equation error (27), and hence is exactly the 
same solution as suggested in Sec. 2. Based on these observations, the 
following iterative algorithm is suggested: 

The iterative algorithm 

Initialization: Letting G/R K G K = /, calculate 0 o in (39). 

Step 1: Calculate 6 j in (39) by letting W = (G/fl^G*) -1 | 

Step 2: Check the convergence of the estimates. If it tends to converge and 
some more improvement is desired, replace the value of 6 o by that of 6 j and go 
to Step 1; otherwise stop. 

The initialization step minimizes the squared norm of the equation errors; in 
Step 1, the weighted squared norms of the equation errors will be minimized; 
and so forth. 

Special case 1: When (v(t),n(t)) are uncorrelated stationary white gaussian 

processes and t i+ i~ t,>T, then 



Rk = diag[/? R ... J?], 

R = co \[c 1 'D”N 1 ,c 1 'D n - 1 N 1 , ..., c,'V u 

.... c m 'D”Ni,c m 'D n ~ 1 N cJV,}. (40) 


G K ' s diag[G' G' ... G'), 
q K * colt?(l), q( 2) ... q(K)] 


(41) 
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where 


q(i) = coltd 'D"Y„ d'D'-'Y, d'Xi. 

.... c m 'D”Y t c m 'D^Y„ .... 

(35) can be reduced to 

m i n ,?i ii G ' 9(0 ii (42) 

The optimization algorithm is similar to the general case, but the dimension of 
the weighting matrix ( G'RGY 1 is reduced by a factor K compared to 
(G K 'R K G K T l in (35). 

Special case 2: In addition to the assumptions made in the Special Case 1, 

assume that m — 1, i.e., only one algebraic equation will be extracted from each 
non-overlapping time interval. Now R in (40) becomes 

R = co v[c l 'D H N l , d'V,]. (43) 


and G' in (41) becomes £'. Furthermore, 

(G’RG)- 1 = (| 'R%T l and G'q(i) = S'q(i) 

are scalars. Hence (42) can be reduced to an even simpler form as 


K 


Z 


II GW || 


2 * 

(Gtfcr 1 


_ £ SWMiTg 
~ ffi £'R£ 


£'(Zq(i)q'(i))Z 

lil 

§'R§ 


ms; 


(44) 


(45) 


where 

B = | i ?(i)?'(j). 

As is well known in matrix theory, the minimum value and the optimal solution 
for (45) are the smallest eigenvalue and its corresponding eigenvector with the 
unit first element of the following generalized eigenvalue problem: 

(B-oR)£ = 0. (46) 

The same solution has been used by Koopman (1937), Levin (1964), Aoki and 
Yue (1970 a; b) for the discrete system identification problem. In the statistics 
literatures, Sprent (1966; 1969) also suggested this solution and called it a 
“generalized least squares solution”. 
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3.1. Simulation results for the recursive ML and LS estimates The 

maximum likelihood estimate developed in the previous section was applied to 
the identification of the systems H i(s) and Hz(s) given by (23) and (25), forced 
by the same colored gaussian random processes as previously described, i.e., 
bandwidths of 5 [rad/sec] and 15 [rad/sec] respectively. Both the input and 
output signals for each system were corrupted as shown in Fig. 4 by zero mean 
white gaussian noises, uncorrelated with each other. Recursive identification 
was carried out for each system using the data collected over the successive 
time intervals [/,, /,+ 2n], i = 0, 1, 2, i.e., <o 0 = 1 [rad/sec], and a 256 

point DFT ( N = 256) was used for the parabolic approximation (22) in 
calculating the Fourier coefficient vectors of the data on each [0, 2ji] time 
interval. 

The modulating function modes were selected as (0, 1, 2, 3, 4, 5) and (7, 9, 
11, 13) respectively for the systems Hi(s) and Hzis), and the resulting 
algebraic equations, which numbered 8 and 4 respectively, were summed on 
each interval in order to meet the scalar equation requirement of Special Case 2. 
Hence, the optimization problem on each [0, 2 n] interval is to find the 
eigenvector corresponding to the smallest eigenvalue of the generalized eigen- 
value problem (46). This was accomplished using a standard subroutine in the 
IMSL (1982). 

The results of the recursive identification for each system are shown in the 
lower portions of Fig. 5 with the norm (24) plotted as a function of the number of 
time intervals. Also shown are the results of the standard recursive least 
squares algorithm applied to the same data (upper curves). The results clearly 
show that the recursive LSE incurs a bias at every noise level which can be 
effectively removed using the recursive MLE. 

4. Conclusions 

Depending on the nature and degree of completeness of a priori knowledge 
available relative to the system and noise spectral characteristics, a judicious 
choice of Fourier based modulating functions can be effective in ameliorating 
noise effects for a deterministic least squares identification based on time 
limited data. This has been verified via the simulation results for several 
examples. In the case of recursive identification involving noise corrupted data 
extracted from non-overlapping time intervals, the maximum likelihood estimate 
of Levin (1964), Aoki and Yue (1970 a; b) has been adapted to the modulating 
function formulation in order to remove the bias incurred by a standard least 
squares algorithm. Future problems include model order determination pro- 
cedures and optimal inputs for system identification within the context of the 
modulating function technique. 
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Appendix: A set of Fourier based modulating functions 

Consider the set of commensurable sinusoids 

(1, coswiO) 0 f, sinwiOoL .... cosm u (o 0 t, sin m M o) 0 t}, (A.l) 

where co 0 ~ 2 n!T and (jhj, m 2 , .... m„) are selected positive integers 
satisfying < m 2 < ... < m M . Within the (2M+1) dimensional function space 
spanned by the functions in (A.l), there will exist a (2Af+l-n) dimensional 
subspace of modulating functions satisfying (2) which can be delineated as 
follows. In order to account for the different cases when the model order n is 
even or odd, the "integer part” notation [n/2] is used: [«/ 2] stands for n/2 
when n is even, or [(n-l)/2] when n is odd. 

Cosine form: For each k = 0, 1, Af-[(n + l)/2], let 

K« + l)/2] 

<Pc,k(t) = y2 «*. y cos»»* +> a>of (A. 2) 

define a modulating function of order n with the a k] coefficients chosen such that 

4> c />(0) = <p c <'\T) =0, f = 0, 2, 4 2[(n-l)/2] (A. 3) 

and 

[0. + D/2] 

2 a k f = 1, a k , o > 0. (A. 4) 

Sine form: For each k = 1, 2 A/-[«/2], let 

[n/2] 

4> s k (t) = yZ b kJ sinm k+i a) 0 t (A. 5) 

define a modulating function of order n with the b kj coefficients chosen such that 
4>J°(0) = <P S /\T) =0, i = 1, 3, 5, 2[«/2]-l (A. 6) 

and 

[n/2] 

Xb„f = 1, b„, o>0. (A. 7) 

With respect to the end point conditions (2), notice that the vanishing of the 
odd and even derivatives was not included in (A. 3) and (A. 6) respectively, since 
these conditions are automatically satisfied by the forms assumed in (A. 2) and 
(A. 5). The existence and uniqueness of the above a k j and b kj coefficients can be 
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established by the following Lemma which also yields explicit expressions for 
their determination. 

Lemma. The coefficients a k) and b kf] specified in (A.2)-(A.7) are uniquely 
determined by Vandermonde type matrices. 

Proof for b k j. For a fixed k in l<*<M-[»/2], condition (A. 6) is equivalent 

to 


m k 


***+[»/ 2] 

m k 3 

3 

m h+ 1 

W*+[n/2] 

z{nf2]-l 

t 


2[«/2H 

W*+[n/2) 



^,0 


' o " 


Ki 

= 

0 


]> k .[ n ! 2]_ 


_ 0 _ 


or 




_ hi ‘ 


m k 



b k ,z 

- “i*. 0 

«* 3 



_ b k '[nlZ\_ 


^ 2[n/2}-l 


The above equation will be denoted as 

M k b k = — fct.o 11 **- 

By letting A* +y = m* +; 2 , ; = 1,2,..., [n/2], M k can be expressed as 


A/* = 

1 

A*+l 

1 

A*+2 

l 

A*+(n/2] 


w**+i 

0 

0 ... 
m k+2 ... 

0 

0 


a. l^M 
A *+ 1 

a *+2 

A* + », 


0 

0 ... 

>”*+[«/ 21 


. (A. 8) 


The first matrix on the right hand side of (A. 8) is the well-known Vandermonde 
matrix; its determinant and inverse matrix can be expressed in explicit forms 
(Graybill, 1983). Hence, Af*" 1 can be explicitly expressed as 


M„ 


-l 


1 

n.(m* + , 2 -m* +; 2 ) 

i>) 


1 


«*+i 

0 


0 

1 


«*+2 


0 
0 

1 

»**+[«/ 2jJ 


(i,;')-element 




P,(m k+ ?) J 


where 
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l«/2] 


[»/2J 


**,•(*) S n (x-m* + /) S 2 for i = 1 , 2 , 

So, 6* can be expressed as 

bk = ~b k qM*' 1 !/!*, 


[«/2]. (A. 9) 


(A. 10) 


and the condition (A. 7) implies that 

f»*,o 2 (l + m* r A/*- r A/*- 1 m*) = 1, and b k , 0 > 0. (A. 11) 


It is clear that the value of b k , o is uniquely specified as 

b k , o = ( 1 + m k T M k - T M k ~' m k )\ ( A . 1 2 ) 


Hence {&*,} can be uniquely found by combining (A. 9), (A. 10), and (A. 12). The 
derivation and proof for a k ; is similar and hence is omitted. 

It is convenient for identification purposes to collect the 2 M + 1 - n modulat- 
ing functions into a single column vector, 




<*>(0 = 


0<:,*-[<*i + l)+/2]^ • 


(A. 13) 




It is easy to see that these (2Af+l-M) modulating functions are linearly 
independent, since in the construction procedure (A. 2) to (A. 7) a new sinusoid 
is added as k is increased. Hence £>(/) consists of a set of basis functions for the 
(2M+1-m) dimensional modulating function space which is a subspace of the 
(2A/+1) dimensional trigonometric function space spanned by the functions in 
(A.l). Let f(t) denote the column vector of 2M+1 sinusoids: 

/(/) s col[l, cosmiO) 0 f, sinMii<w 0 f, .... cosm^ouof, sinm^toofL 

0<t<T. 

Then #(f) in (A. 13) can be represented as 

<P(t) = Cf(t), (A. 14) 

where C is an (2A/+l-n)x(2A/+l) matrix determined by the {a ki , b kj ) in 
(A.2MA.7). 

The above results are summarized in the following Theorem. 

Theorem. In the (2Af+l)-dimensional vector space spanned by the set in 
(A.l), there exits a (2A/+ 1 - n)-dimensional subspace of modulating functions 
of order n represented by the vector function 4>(() in (A. 14). The matrix C in 
(A. 14) has rank (2M+1-m). 
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Abstract 

Several variants are presented of a linear-in-parameters least squares formulation for determin- 
ing the transfer function of a stable linear system at specified frequencies given a finite set of 
Fourier series coefficients calculated from transient nonstationary input-output data. The basis 
of the technique is Shinbrot’s classical method of moment functionals using complex Fourier 
based modulating functions to convert a differential equation model on a finite time interval 
into an algebraic equation which depends linearly on frequency-related parameters. 
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1. INTRODUCTION 

Methods for determining the transfer function of a stable linear system from input-output 
data include correlation and spectral analyses, as well as the direct sinusoidal measurements. 
Each of these “nonparametric” identification techniques require either a statistical stationarity 
assumption on the data, or a periodic steady state condition to be established, before initiating 
calculations of the transfer function at pertinent frequencies. Excellent summaries of these 
methods, as well as the analysis of noise effects and finite data lengths, can be found in 
Astrom [1], Ljung [2], Soderstrom and Stoica [3], and Unbehauen and Rao [4]. Notwith- 
standing noise considerations, long data lengths may be required in order to achieve good 
accuracy due to the stationarity or steady state assumption. By contrast, a method is proposed 
here that utilizes the frequency content in short data lengths in order to set up a least squares 
estimation of the transfer function at selected frequencies. Since short data lengths are used 
there is no assumption of steady state operation or stationarity of the data, though there must 
be present sufficient energy content in the data at the specified frequencies in order to avoid 
degeneracy in the least square estimate. The basis of the technique is the classical Shinbrot 
[5] method of moment functionals, also known as the modulating function technique, using 
complex Fourier based modulating functions. A forerunner of this approach can be found in 
Pearson and Lee [6] which utilized real valued Fourier based modulating functions, i.e., com- 
mensurable sinusoids, for the parameter estimation of linear differential systems. Therein also 
can be found a discussion of the background of this method with a listing of references. 
However, this paper appears to represent the first use of modulating functions in the context 
of the “nonparametric” system identification problem. Several variants of a deterministic 
least squares estimation of frequency-related parameters that underlie the transfer function will 
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be developed below. 

2. Least Squares Formulations 

Consider a stable linear system with input u(t) and output y(t) modeled on a finite time 
interval by the differential operator equation: 

A(p)y(t)=B(p)u(t)+e(t) (1) 

where (A(p)Jl(p )) are polynomials in the differential operator p =d Idt of degree less than or 

equal to an a priori integer n , and e ( t ) represents the effect of modeling errors. The problem 
considered here is to estimate the transfer function G(jas)=B (J co)M (J co) at a finite set of fre- 
quencies {k(O 0 , k= 1,2 • • M), where co 0 is a user selected “resolving” frequency and M a 
chosen integer, given the input-output data [w(r),y(r)l over a finite set of time intervals 
{[fj.fj+To], i=l, • -N)} These time intervals are each chosen of length T 0 =27t/co 0 and need 
not necessarily be disjoint. However, a certain degree of independence in the data collected 
over the different [0,7 0 ] time intervals is necessary in order to avoid degeneracies in the least 
squares estimate to be discussed below. Understandably these degeneracies are more likely to 
be avoided in normal operating records if the intervals are disjoint. In addition to the upper 
bound on the system order n , the DC value of the transfer function is assumed given or can 
be measured from the step response, i.e., G (0)=B (0 )/A (0) is presumed known a priori. If 
this is not the case, then the estimated transfer function can be scaled by the parameter G (0). 

The Shinbrot method of moment functionals is a technique for converting a differential 
equation on a finite time interval into an algebraic equation by the use of “modulating func- 

1 If the system bandwidth £0# is known, then choosing (M ,C 0 q) such that M COg-COg will cover 
the bandwidth at the knots k (Oq, k=l,2 • • M. 
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tions”. As introduced by Shinbrot [5], <J>(0 is a modulating function of order n relative to a 
finite time interval 0 <!r^7 q if it is sufficiently smooth and satisfies the end point conditions: 

<J) (,) (0)=<|> (i) (r 0 )=0, /=0,1 ■ • n-1 (2) 

where )=^> * <J>(r ). Clearly, modulating functions can be constructed in many different 

ways. 2 Here a specific set of complex valued Fourier based modulating functions is defined in 

a way that is conducive to solving the problem at hand, viz., let 

(\-e jw ) n , 0<t£T o=27t/(D 0 ( 3a ) 

m=0,l • • M 

define a set of modulating functions of order n with respect to the time interval [0,7 0 ], 
Equivalently by the binomial expansion, each such function is representable by 

<UO=XV y(m+ * )ov . (3b) 

*=o 

where b k is defined in relation to the binomial coefficient by 

h = (-»* (j[]. < 4 > 

The first representation (3a) makes evident the fact that each 4> m (f ) is indeed a modulat- 
ing function of order n , i.e., (2) is satisfied, 3 while the second representation (3b) implies that 
calculating linear functionals defined by each <j> m (r) on a set of functions specified over [0,7 0 ] 
will entail calculating the Fourier series coefficients of these functions at the frequencies k coq, 
k=mjn+l • • m+n, m=0,l ■ • M. In turn, these coefficients can be calculated efficiently by 
DFT/FFT methods which provides an important motivating factor for this analysis. This will 

2 See discussion in Pearson and Lee [6]. 

3 Notice that any modulating function of a fixed order is automatically a modulating function of 
any lower order relative to the same time interval. This property facilitates the formulation for any 
system of order less than or equal to the upper bound n . 
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be discussed further below. The important property of the functions defined in (3) is con- 
tained in the following 4 

Modulation Property. Let P(p) be a differential operator of order at most n, i.e., a 
polynomial in p=d/dt of degree <n, and z(r) any sufficiently smooth function defined 
on [0,r 0 ]. Then the modulation of P(p)z(t) with <|> m (r) over [0,T 0 ] satisfies 

m+n 

J <|> m (t )P (p )z (t )dt = X b k-m P H* ®o )Z-k (5) 

0 k=m 

where Z k is the k th harmonic Fourier series coefficient of z(t), i.e., 

T 

Z k = jz(t)e' jk(0ct dt. (6) 

Note that owing to the end point constraints (2) satisfied by each <|> m (r ) function, none of the 
boundary point derivatives z (i) (0) or z (i) (T 0 ) appear in (5). This is crucial to the ensuing 
analysis and, in fact, represents a primary reason for employing the modulating function tech- 
nique. 

2.1. Formulation 1 

A direct application of the above property to the problem posed involves rewriting the 
differential operator model (1) in the equation error form followed by the modulation with 

<L(0; thus, 5 


4 Proof of this property is given in the Appendix in order to proceed directly with the develop- 
ment. 

5 The process of going from the model (1) to equation (7) can be viewed as a projection from a 
space of functions on [0,T 0 ] down into a finite dimensional space spanned by the modulating func- 
tions. 
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| <J> m (f ) |A (p )y ( t )-B (p )u (r ) Jdr = J 4»m (' )e ( t )dt. 


In view of (5) the preceding equation is equivalent to 


m+n f "1 m+n 

Zh-m \AHk(o 0 )Y. k -B(-jka 0 )U_ k = Zh-m^-k- 

k=m 1 J k=m 


Define the real and imaginary parts of the polynomials ( A (Jk (Hq)B (jk ®o)) as follows: 


(7) 


( 8 ) 


A (jk co 0 )=a* +j (i* , B(jk co 0 )^Y k + j h ( 9 ) 

and collect these together to form the 4x1 “parameter” vector: 

a k 

8 , = 5 * ' (10) 

h. 

Also, define as follows the 2x4 data matrix y*(0 in terms of the real and imaginary parts of 
the k ,h harmonic Fourier series coefficients of the input-output data corresponding to the time 
interval [r, ,t i +T 0 ], / =1,2 • -N: 

Y c k (i) Y* k (i) -U k (i) -Ul(i) 

V * 0)= -Y k (i ) Y£(i) U£(i) - U c k (i ) 

The notation for the entries in (11) is explained by 

T 0 To 

Y k (i ) = f y (r -K, )cos k c%r dt, Y s k (i ) = f y ( t +/,■ )sin£ co 0 t dt (12) 

o o 

and similarly for (U k (i),U k (i)). Then the real and imaginary parts of the equation error (8) 
can be collected into the following real valued 2x1 vector equation which serves as the start- 
ing point for a least squares estimation: 



m+n 

00* =e m (0 


k=m 


( 13 ) 


m=0,l ' • M , i=l,2 • • N. 
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The equation error vector in (13) is related to the Fourier coefficients of the original equation 
error by 


m+n 

O' ) = bk-m 

k=m 

Values of the transfer function G (jk co 0 )=5 (jk co 0 )/^ (jk ®o) are seen from (9) and (10) to 
be related to the parameter vector Q k by the real and imaginary part relations: 


m) 

W).' 


ReG (jk u o) : 


<*a+P* 


ImG (jk (0 0 )= 


«*+P * 


or equivalently by the magnitude-phase relations: 


(14) 


Y?+8t , 5 k . Pi 

|G (/£ g>o)P = G(jkta k ) = tan -1 — tan -1 ——. (15) 

a*+P* Y* a k 

Starting from a presumed knowledge of the DC value G (0), which implies that 
00 = 1/1 (0),03 (0),0]' is known, equation (13) can be rewritten in the standard regression equa- 
tion format to estimate the parameters Q k , £=1,2 • • M+n given the data over a sufficient 
number of [f l -,f J -+7 , Q], intervals, i=l,2 • • N. A consideration of this equation reveals the fol- 


lowing: 

1) The frequency range covered by the parameters in (13) is (M+n )co 0 . Hence, if it 
is desired that the transfer function estimate cover a frequency range about 25% 
greater than the system bandwidth to# at a resolution cdq, a choice in (M ,co 0 ) such that 


(M +n )o 0 = 1 .25co fi (16) 

reflects this objective. 

2) Counting unknowns in (10) and (13), the total number is A(M+n). Since each 
equation in (13) is of dimension 2, counting equations suggests that the total number 
N of [tiJi+T 0 ] intervals should satisfy: 2(M+\jN^A(M+n), i.e., N^2(M +n)/(M +1). 
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However, the equations in (13) are partially decoupled with respect to the index m. 
Therefore, it seems best to estimate (0 1# • • 6„) at the first stage, which corresponds to 
m= 0. This means that there are 4 n unknowns for the first stage requiring that N 
satisfy N>2n. Thereafter, the number of unknowns is just 4 for each stage 
corresponding to m =1,2 •• M , which implies N^2 assuming that the preceding esti- 
mates are used in each succeeding stage. This kind of “bootstrapping” of the least 
squares estimation facilitates keeping the number of unknowns to a modest level at 
each stage. 

3) The two row vectors comprising \\r k ( i) in (11) are seen to be mutually orthogonal 
for each k and i suggesting a maximal degree of independence for these equations in 
utilizing the information content in the data. This is a direct result of the Fourier 
nature of the underlying formulation. 

Discussion: The above development shows that it is possible to formulate a linear-in-the 
parameters least squares estimation problem for parameters (10) that underlie (via (14) or 
(15)) the transfer function C(/jI:co 0 ) at each k th harmonic frequency. The input-output data 
can be time-limited and transient, but must have sufficient energy content at the specified fre- 
quencies to avoid degeneracies in the least squares solution. Apart from being highly non- 
linear, the relations (14) and (15) involve the difference between parameter related quantities, 
e.g., a*Y*-P*5*> whose values may be large for large k. This aspect of the formulation por- 
tends a potential source of error magnification which is alleviated by the formulation of the 


next section. 
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2.2. Formulation 2 

Given that [u (t),y(t)] satisfies the model (1) on a [0,r 0 ] time interval, it follows that 
[«(f)*y(0] also satisfies the model 

A {-p )A (p )y (r )=A (-p )B ip )u ( t)+A (-p )e it). (17) 

Choosing a set of modulating functions of order In to accommodate the upper bound on the 

highest degree differential operator in (17) and modulating this equation with the m ,h member 

of this set, the following projected equation error results which is analogous to (8): 

m+2n r 1 m+2n_ 

Z bk-m \A ijk co 0 )A i-jk(d 0 )Y_ k -A (jka 0 )B (-jk(£> 0 )U_ k 1= £ b k _ m A (jka 0 )E_ k (18) 
k=m J- J k=m 

where b k is defined by, cf. (4): 

b t = <-D‘ ( 2 c). 

Noting that A ijk co 0 )A i~jk co 0 ) is real while A ijk (o 0 )B (-jk a> 0 ) is complex, define real quanti- 
ties ia k .a^Pj.) by the relations: 

a k =A ijk to 0 )A (-jk co 0 ), a k +j ^ k =A ijk a^B (-jk (O q) (19) 

and collect these into the 3x1 parameter vector 0* defined by 

_ a k 

Q k = ct k . ( 20 ) 

Pi _ 

Also, define the 2x3 data matrix \| t k (i ) by 

Y£(i) -U c k (i) mu ) 

v * (,)= Yin) -m ( o -ma) 

where the notation for the entries in (21) is the same as defined in (12). Then the real and 
imaginary parts of the projected equation error (18) can be represented by the following real 
2x1 vector equation: 
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TO + 2/1 

X bk-mVk O' )&k = O' ) (22) 

Jt=TO 

m=0,\--M, i=l,2--N. 

Equation (22) can be rewritten into the standard regression equation format for setting up 
the least squares estimate of the parameters (0j, • • $M+ 2 n) based on the data over [r,,r,+r 0 ], 
i=l,2 • • N. Again, presumed knowledge of the DC value implies that 6 q=A (0)[<4 (0),B (0),0]' 
is known or, if not, the resulting transfer function estimate can be scaled by the parameter 
G (0). Here the estimates of the transfer function are related to the parameters by the real and 
imaginary part equations (as found from (19) and (20)): 

ReG (jk co 0 )=— , ImG ( jk co 0 )=-— (23) 

a k a k 

or equivalently by the magnitude-phase relations: 

|GO*c> 0 )P = -^, G (jk to 0 ) = -tan -1 . (24) 

Qk a * 

Consideration of the least squares formulation in this case leads to the following: 

1) The frequency range covered by the parameters in (22) is (M +2n )co 0 ; hence, the 
guideline (analogous to (16)) for choosing the pair (M ,o) 0 ) in this case is 

(M+2n)coo = 1.25co fi . (25) 

2) Counting unknowns in (20) and (22), the total number is 3(M+2n ) which would 

imply that the total number N of [f l -,/ l +7’ 0 ] time intervals should satisfy: 
2N (M +\)>3(M +2n). However, the partially decoupled nature of the equation (22) 
with respect to the m index suggests bootstrapping the solution from the first stage. 
Thus, for the initial stage (m= 0), N needs to satisfy N>3n. In the succeeding stages, 
N needs to satisfy N>2 (since there are 3 unknowns and 2N equations at each such 
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stage); this assumes that the preceding estimates arc used at each succeeding stage 
(m= 1,2 • • M). 

Discussion: Comparing (23) and (24) with (14) and (15) reveals that the second formulation 
avoids the potential error magnification problem of differencing large estimated quantities in 
calculating the transfer function at high frequencies. However, the second formulation 
requires estimating 6 n unknowns at the first stage, i.e., the m=0 stage, verses 4 n unknowns 
for the first stage of the first formulation. 


2.3. Formulation 2-Dual 

The dual to the formulation of the preceding section is to observe that a given pair 
[u(r)j(0] satisfying the model (1) on a [0,T 0 ] time interval also implies that it satisfies (cf. 

(17)) 


B(-p)A(p )y (t )=B (-p )B (p )u (t )+B (-p )e ( t ). (26) 

Again choosing a set of modulating functions of order 2n, a development similar to that of 

the previous section leads to the real 2x1 vector equation 

m+2n__ 

X t> k -mVk(i)Q k =£/n(0 (27) 

k=m 

m=0,l • • M , i'=l,2 • • N 

where the data matrix \\r k ( i ) is defined by 


V*( 0 = 


Ulii) - Y c k ii ) YHi) 
U s k U ) -Y k (i) ~Y k (i ) 


and the real 3x1 parameter vector 0* is defined by 


(28) 
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with the entries in (29) defined by 



bk 

a* 

-P* 


(29) 


b k =B ( jk (Do )£ (-/* CDo), a*+/ P* =A(jk (Oq)B (-jk co 0 ). (30) 

Relations between the transfer function and the parameters in this case are found to be 

a k b k -Pjfcfyt 

ReG (Jk co 0 )= — > ImG (jk o> 0 )=^— r (31) 

a* 2 +p * 2 a*+P* 

for the real and imaginary parts, or for the magnitude-phase: 

b 2 R 

p(Jkaof = -^-r, G (jkoio) = -tan" 1 ——. (32) 

Comparing (21) and (28) verifies the duality of the two formulations by virtue of the 
interchange of input and output. Note that each formulation has the same total number of 
unknowns - in general. However, the dual formulation has the potential advantage of reduc- 
ing the total number of unknowns in the event of a priori information on a lower degree 
numerator polynomial than denominator polynomial in the transfer function. For example, an 
“all pole” model means that b k =(B( 0)) 2 is known for all k, i.e., a total of 2(M +n ) unknowns 
verses 3(M+2n) for the previous case. The formulation leading to (27) can easily be 
modified to reflect this consideration. 


3. Conclusions 

Three formulations of a linear-in-the parameters least squares estimation have been 
presented for determining the transfer function of a linear system at specified frequencies 
given transient nonstationary input-output data. While some comparisons have been noted in 
the discussions following each formulation, a clear indication of the pitfalls and advantages of 
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each will have to await a thorough simulation study including the effects of noise. Such a 
study is currently underway. 
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5. Appendix 

To verify the modulation property (5) it is sufficient to consider the differential operator 

P(p)=p ' for a fixed i£n. The result for a general n th degree polynomial Pip) then follows 

by superposition. Thus, for sufficiently smooth z(t) on [0 ,Tq] and <\> m (t) defined in (3), the 

left side of (5) in this case is 

T 0 T 0 

|<t>m (t)p l z(t)dt = (-1)' 

where integration-by-parts has been used i times taking into account the boundary conditions 
(2) possessed by each <> m (r ) function. Substituting the representation (3b) into the right side 
of (33), carrying out the indicated differentiation and changing the index of summation 
verifies (5) for P ip )=p ' as purported. 



